Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: End-to-end autonomous driving systems increasingly rely on vision-centric
world models to understand and predict their environment. However, a common
ineffectiveness in these models is the full reconstruction of future scenes,
which expends significant capacity on redundantly modeling static backgrounds.
To address this, we propose IR-WM, an Implicit Residual World Model that
focuses on modeling the current state and evolution of the world. IR-WM first
establishes a robust bird's-eye-view representation of the current state from
the visual observation. It then leverages the BEV features from the previous
timestep as a strong temporal prior and predicts only the "residual", i.e., the
changes conditioned on the ego-vehicle's actions and scene context. To
alleviate error accumulation over time, we further apply an alignment module to
calibrate semantic and dynamic misalignments. Moreover, we investigate
different forecasting-planning coupling schemes and demonstrate that the
implicit future state generated by world models substantially improves planning
accuracy. On the nuScenes benchmark, IR-WM achieves top performance in both 4D
occupancy forecasting and trajectory planning.
Authors (7)
Jianbiao Mei
Yu Yang
Xuemeng Yang
Licheng Wen
Jiajun Lv
Botian Shi
+1 more
Submitted
October 19, 2025
Key Contributions
IR-WM proposes an Implicit Residual World Model that focuses on predicting scene changes (residuals) rather than full scene reconstruction, significantly reducing computational load. It leverages temporal priors and includes an alignment module to mitigate error accumulation, improving the accuracy of 4D occupancy forecasting and planning.
Business Value
Enhances the safety, efficiency, and reliability of autonomous driving systems by enabling more accurate prediction of dynamic environments and better integration with planning modules.