Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Crash detection from video feeds is a critical problem in intelligent
transportation systems. Recent developments in large language models (LLMs) and
vision-language models (VLMs) have transformed how we process, reason about,
and summarize multimodal information. This paper surveys recent methods
leveraging LLMs for crash detection from video data. We present a structured
taxonomy of fusion strategies, summarize key datasets, analyze model
architectures, compare performance benchmarks, and discuss ongoing challenges
and opportunities. Our review provides a foundation for future research in this
fast-growing intersection of video understanding and foundation models.