Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: End-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm
that unifies perception, prediction, and planning into a holistic, data-driven
framework. However, achieving robustness to varying camera viewpoints, a common
real-world challenge due to diverse vehicle configurations, remains an open
problem. In this work, we propose VR-Drive, a novel E2E-AD framework that
addresses viewpoint generalization by jointly learning 3D scene reconstruction
as an auxiliary task to enable planning-aware view synthesis. Unlike prior
scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference
strategy that supports online training-time augmentation from sparse views
without additional annotations. To further improve viewpoint consistency, we
introduce a viewpoint-mixed memory bank that facilitates temporal interaction
across multiple viewpoints and a viewpoint-consistent distillation strategy
that transfers knowledge from original to synthesized views. Trained in a fully
end-to-end manner, VR-Drive effectively mitigates synthesis-induced noise and
improves planning under viewpoint shifts. In addition, we release a new
benchmark dataset to evaluate E2E-AD performance under novel camera viewpoints,
enabling comprehensive analysis. Our results demonstrate that VR-Drive is a
scalable and robust solution for the real-world deployment of end-to-end
autonomous driving systems.
Authors (7)
Hoonhee Cho
Jae-Young Kang
Giwon Lee
Hyemin Yang
Heejun Park
Seokwoo Jung
+1 more
Submitted
October 27, 2025
Key Contributions
Proposes VR-Drive, a novel end-to-end autonomous driving framework that achieves viewpoint generalization by jointly learning 3D scene reconstruction for planning-aware view synthesis. It uses a feed-forward strategy for online augmentation and incorporates a viewpoint-mixed memory bank and distillation for consistency.
Business Value
Enhances the reliability and safety of autonomous driving systems by making them robust to different camera placements and perspectives, accelerating development and deployment.