arxiv_ml 95% Match Research Paper Computer vision researchers,Generative AI developers,Robotics engineers (for simulation) 1 week ago

Generative View Stitching

computer-vision › diffusion-models

📄 Abstract

Abstract: Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the future. In camera-guided video generation with a predefined camera trajectory, this limitation leads to collisions with the generated scene, after which autoregression quickly collapses. To address this, we propose Generative View Stitching (GVS), which samples the entire sequence in parallel such that the generated scene is faithful to every part of the predefined camera trajectory. Our main contribution is a sampling algorithm that extends prior work on diffusion stitching for robot planning to video generation. While such stitching methods usually require a specially trained model, GVS is compatible with any off-the-shelf video model trained with Diffusion Forcing, a prevalent sequence diffusion framework that we show already provides the affordances necessary for stitching. We then introduce Omni Guidance, a technique that enhances the temporal consistency in stitching by conditioning on both the past and future, and that enables our proposed loop-closing mechanism for delivering long-range coherence. Overall, GVS achieves camera-guided video generation that is stable, collision-free, frame-to-frame consistent, and closes loops for a variety of predefined camera paths, including Oscar Reutersv\"ard's Impossible Staircase. Results are best viewed as videos at https://andrewsonga.github.io/gvs.

Authors (5)

Chonghyuk Song

Michal Stary

Boyuan Chen

George Kopanas

Vincent Sitzmann

Submitted

October 28, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Generative View Stitching (GVS) is proposed as a sampling algorithm that enables parallel generation of video sequences, ensuring faithfulness to predefined camera trajectories and avoiding collisions. Crucially, GVS is compatible with existing off-the-shelf video models trained with Diffusion Forcing, without requiring specialized model training.

Business Value

Enables the creation of realistic and controllable video content for applications like virtual reality, gaming, film production, and simulation environments, potentially reducing manual effort and costs.

Paper Metadata

Innovation Type

Algorithmic Extension

Deployment Feasibility

High, as it leverages existing video diffusion models and Diffusion Forcing, making integration easier.

Limitations Addressed

Addresses the limitation of autoregressive video diffusion models that cannot condition on future frames, leading to issues like collisions with the scene and rapid generation collapse in camera-guided scenarios. It overcomes the need for specially trained models typically required by stitching methods.

Performance Gains

Faithful generation to camera trajectory,Avoidance of scene collisions,Compatibility with off-the-shelf models

Technical Tags

generative modelsvideo generationdiffusion modelsautoregressive modelscamera trajectorysampling algorithmsparallel samplingDiffusion Forcingrobot planningscene generation

Research Topics

Generative AIComputer VisionDeep LearningVideo Synthesis

Methods & Architectures

Generative View Stitching (GVS)Parallel samplingDiffusion model adaptationExtending diffusion stitching Autoregressive video diffusion modelsDiffusion models

Applications & Tasks

Video generation 3D scene synthesis Robotics simulation Controllable video generationCamera-guided generationLong-sequence generation Generating videos faithful to camera trajectoriesAvoiding collisions in generated scenesImproving stability of long video rollouts

Related Fields

Computer GraphicsRoboticsGenerative ModelsDeep Learning

Keywords

video generationdiffusion modelsgenerative AIcamera controlautoregressive modelsparallel samplingscene synthesisDiffusion Forcingrobot planningsampling algorithmcomputational vision

Academic Context

#Generative AI#Computer Vision#Deep Learning#Video Synthesis

Commercial Potential

Potential Products

Controllable video generation tools3D scene generation platformsVirtual environment creation software

Target Industries

Media and EntertainmentGamingVirtual RealityRobotics

Use Case Examples

Generating walkthroughs of virtual environmentsCreating animated sequences for filmsSimulating robot camera movements

Competitive Edge

Offers a novel sampling strategy for video diffusion models that overcomes autoregressive limitations and is compatible with existing frameworks, unlike methods requiring specialized models.

Resource Requirements

Compute Needs

Requires significant compute for training and inference of diffusion models.

Data Requirements

Requires video datasets suitable for training diffusion models.

Deployment Constraints

Inference time for long sequences might still be a factor.

Scalability

Parallel sampling improves efficiency for generating longer sequences.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers