Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This paper proposes AutoScape, a long-horizon driving scene generation
framework. At its core is a novel RGB-D diffusion model that iteratively
generates sparse, geometrically consistent keyframes, serving as reliable
anchors for the scene's appearance and geometry. To maintain long-range
geometric consistency, the model 1) jointly handles image and depth in a shared
latent space, 2) explicitly conditions on the existing scene geometry (i.e.,
rendered point clouds) from previously generated keyframes, and 3) steers the
sampling process with a warp-consistent guidance. Given high-quality RGB-D
keyframes, a video diffusion model then interpolates between them to produce
dense and coherent video frames. AutoScape generates realistic and
geometrically consistent driving videos of over 20 seconds, improving the
long-horizon FID and FVD scores over the prior state-of-the-art by 48.6\% and
43.0\%, respectively.
Authors (8)
Jiacheng Chen
Ziyu Jiang
Mingfu Liang
Bingbing Zhuang
Jong-Chyi Su
Sparsh Garg
+2 more
Submitted
October 23, 2025
Key Contributions
AutoScape introduces a novel RGB-D diffusion model for generating geometrically consistent, long-horizon driving scenes. It achieves this by iteratively generating keyframes conditioned on existing geometry and using warp-consistent guidance, followed by video diffusion for interpolation, significantly improving FID and FVD scores.
Business Value
Enables the creation of highly realistic and consistent virtual environments for training autonomous driving systems, reducing the need for extensive real-world data collection and testing in safe, simulated scenarios.