Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Vehicle crashes involve complex interactions between road users, split-second
decisions, and challenging environmental conditions. Among these, two-vehicle
crashes are the most prevalent, accounting for approximately 70% of roadway
crashes and posing a significant challenge to traffic safety. Identifying
Driver Hazardous Action (DHA) is essential for understanding crash causation,
yet the reliability of DHA data in large-scale databases is limited by
inconsistent and labor-intensive manual coding practices. Here, we present an
innovative framework that leverages a fine-tuned large language model to
automatically infer DHAs from textual crash narratives, thereby improving the
validity and interpretability of DHA classifications. Using five years of
two-vehicle crash data from MTCF, we fine-tuned the Llama 3.2 1B model on
detailed crash narratives and benchmarked its performance against conventional
machine learning classifiers, including Random Forest, XGBoost, CatBoost, and a
neural network. The fine-tuned LLM achieved an overall accuracy of 80%,
surpassing all baseline models and demonstrating pronounced improvements in
scenarios with imbalanced data. To increase interpretability, we developed a
probabilistic reasoning approach, analyzing model output shifts across original
test sets and three targeted counterfactual scenarios: variations in driver
distraction and age. Our analysis revealed that introducing distraction for one
driver substantially increased the likelihood of "General Unsafe Driving";
distraction for both drivers maximized the probability of "Both Drivers Took
Hazardous Actions"; and assigning a teen driver markedly elevated the
probability of "Speed and Stopping Violations." Our framework and analytical
methods provide a robust and interpretable solution for large-scale automated
DHA detection, offering new opportunities for traffic safety analysis and
intervention.
Authors (9)
Boyou Chen
Gerui Xu
Zifei Wang
Huizhong Guo
Ananna Ahmed
Zhaonan Sun
+3 more
Submitted
October 14, 2025
Key Contributions
Presents a framework using a fine-tuned LLM (Llama 3.2 1B) to automatically infer Driver Hazardous Actions (DHAs) from textual crash narratives, improving data validity and interpretability. It benchmarks this approach against traditional ML classifiers for two-vehicle crashes.
Business Value
Enhances traffic safety by providing more accurate and reliable data on crash causes, enabling better prevention strategies, insurance risk assessment, and policy making.