Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper Traffic safety researchers,Automotive safety engineers,Insurance analysts,Data scientists in transportation 3 weeks ago

From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model

large-language-models › reasoning
📄 Abstract

Abstract: Vehicle crashes involve complex interactions between road users, split-second decisions, and challenging environmental conditions. Among these, two-vehicle crashes are the most prevalent, accounting for approximately 70% of roadway crashes and posing a significant challenge to traffic safety. Identifying Driver Hazardous Action (DHA) is essential for understanding crash causation, yet the reliability of DHA data in large-scale databases is limited by inconsistent and labor-intensive manual coding practices. Here, we present an innovative framework that leverages a fine-tuned large language model to automatically infer DHAs from textual crash narratives, thereby improving the validity and interpretability of DHA classifications. Using five years of two-vehicle crash data from MTCF, we fine-tuned the Llama 3.2 1B model on detailed crash narratives and benchmarked its performance against conventional machine learning classifiers, including Random Forest, XGBoost, CatBoost, and a neural network. The fine-tuned LLM achieved an overall accuracy of 80%, surpassing all baseline models and demonstrating pronounced improvements in scenarios with imbalanced data. To increase interpretability, we developed a probabilistic reasoning approach, analyzing model output shifts across original test sets and three targeted counterfactual scenarios: variations in driver distraction and age. Our analysis revealed that introducing distraction for one driver substantially increased the likelihood of "General Unsafe Driving"; distraction for both drivers maximized the probability of "Both Drivers Took Hazardous Actions"; and assigning a teen driver markedly elevated the probability of "Speed and Stopping Violations." Our framework and analytical methods provide a robust and interpretable solution for large-scale automated DHA detection, offering new opportunities for traffic safety analysis and intervention.
Authors (9)
Boyou Chen
Gerui Xu
Zifei Wang
Huizhong Guo
Ananna Ahmed
Zhaonan Sun
+3 more
Submitted
October 14, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

Presents a framework using a fine-tuned LLM (Llama 3.2 1B) to automatically infer Driver Hazardous Actions (DHAs) from textual crash narratives, improving data validity and interpretability. It benchmarks this approach against traditional ML classifiers for two-vehicle crashes.

Business Value

Enhances traffic safety by providing more accurate and reliable data on crash causes, enabling better prevention strategies, insurance risk assessment, and policy making.