arxiv_cv 96% Match Research Paper AI researchers in generative models,Autonomous driving engineers,Simulation developers,VR/AR content creators 2 weeks ago

AutoScape: Geometry-Consistent Long-Horizon Scene Generation

computer-vision › scene-understanding

📄 Abstract

Abstract: This paper proposes AutoScape, a long-horizon driving scene generation framework. At its core is a novel RGB-D diffusion model that iteratively generates sparse, geometrically consistent keyframes, serving as reliable anchors for the scene's appearance and geometry. To maintain long-range geometric consistency, the model 1) jointly handles image and depth in a shared latent space, 2) explicitly conditions on the existing scene geometry (i.e., rendered point clouds) from previously generated keyframes, and 3) steers the sampling process with a warp-consistent guidance. Given high-quality RGB-D keyframes, a video diffusion model then interpolates between them to produce dense and coherent video frames. AutoScape generates realistic and geometrically consistent driving videos of over 20 seconds, improving the long-horizon FID and FVD scores over the prior state-of-the-art by 48.6\% and 43.0\%, respectively.

Authors (8)

Jiacheng Chen

Ziyu Jiang

Mingfu Liang

Bingbing Zhuang

Jong-Chyi Su

Sparsh Garg

+2 more

Submitted

October 23, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

AutoScape introduces a novel RGB-D diffusion model for generating geometrically consistent, long-horizon driving scenes. It achieves this by iteratively generating keyframes conditioned on existing geometry and using warp-consistent guidance, followed by video diffusion for interpolation, significantly improving FID and FVD scores.

Business Value

Enables the creation of highly realistic and consistent virtual environments for training autonomous driving systems, reducing the need for extensive real-world data collection and testing in safe, simulated scenarios.

Paper Metadata

Innovation Type

Algorithmic Improvement and Novel Framework

Deployment Feasibility

Moderate. Requires significant computational resources for training and generating long videos. Integration into existing simulation platforms is feasible.

Limitations Addressed

Difficulty in maintaining geometric consistency over long sequences,Lack of realistic and coherent long-horizon scene generation,Challenges in synthesizing complex driving environments

Performance Gains

48.6% improvement in long-horizon FID,43.0% improvement in long-horizon FVD

Technical Tags

scene generationlong-horizon generationdriving scenesRGB-D diffusion modelgeometric consistencykeyframesvideo diffusion modelpoint cloudswarp-consistent guidanceFIDFVD

Research Topics

Generative ModelsScene SynthesisComputer VisionAutonomous Driving SimulationVideo Generation

Methods & Architectures

RGB-D Diffusion ModelJoint Image and Depth Latent SpaceConditioning on Scene Geometry (Point Clouds)Warp-consistent GuidanceVideo Diffusion Model Interpolation Diffusion ModelsVideo Diffusion Models

Applications & Tasks

Autonomous Driving Simulation Virtual Reality Gaming Film Production Generating long-horizon, geometrically consistent scenesMaintaining coherence in appearance and geometry over extended sequencesDifficulty in synthesizing realistic driving scenarios Long-horizon driving scene generationGeometrically consistent video synthesisRealistic simulation environment creation

Datasets & Benchmarks

Benchmarks

Long-horizon FID: improved by 48.6% • Long-horizon FVD: improved by 43.0%

FIDFVD

Related Fields

Computer VisionGenerative AISimulationAutonomous SystemsVirtual Reality

Keywords

scene generationlong-horizondriving scenesRGB-Ddiffusion modelgeometric consistencykeyframesvideo generationsimulationautonomous drivingFIDFVDpoint cloudswarp-consistent

Academic Context

#Generative Models#Scene Synthesis#Computer Vision#Autonomous Driving Simulation#Video Generation

Commercial Potential

Potential Products

Simulation platforms for autonomous vehiclesTools for generating virtual environments for gaming and VRContent creation tools for realistic digital scenes

Target Industries

Automotive (Autonomous Driving)GamingVirtual RealityFilm and MediaSimulation Technology

Use Case Examples

Generating diverse and challenging driving scenarios for training self-driving car AICreating realistic virtual worlds for immersive gaming experiencesProducing high-quality background scenes for movies or virtual productions

Competitive Edge

Outperforms prior state-of-the-art in long-horizon scene generation metrics (FID, FVD) by focusing on geometric consistency and leveraging a novel RGB-D diffusion approach.

Market Opportunity

Significant market for simulation tools in autonomous driving and the growing VR/AR industry.

Revenue Models

Licensing of the generation frameworkproviding scene generation as a serviceintegration into existing simulation software.

Resource Requirements

Compute Needs

Very high, especially for training diffusion models and generating long videos.

Data Requirements

Large datasets of driving scenes with synchronized RGB and depth information.

Deployment Constraints

Computational cost for generation, potential for artifacts in interpolated frames.

Scalability

Generating very long sequences (beyond 20 seconds) might require further architectural improvements.

Regulatory Considerations

None directlybut related to the safety validation of AI trained in simulated environments.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for robust integration into simulation platforms.

Patent Potential

Moderate, related to the novel diffusion model architecture and guidance techniques for geometry-consistent scene generation.

View Full Paper Back to Papers