Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper AI Researchers,Generative Model Developers,Content Creators,Filmmakers,VFX Artists 1 week ago

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

computer-vision › diffusion-models
📄 Abstract

Abstract: Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic technique known as Frame In and Frame Out. Specifically, starting from image-to-video generation, users can control the objects in the image to naturally leave the scene or provide breaking new identity references to enter the scene, guided by a user-specified motion trajectory. To support this task, we introduce a new dataset that is curated semi-automatically, an efficient identity-preserving motion-controllable video Diffusion Transformer architecture, and a comprehensive evaluation protocol targeting this task. Our evaluation shows that our proposed approach significantly outperforms existing baselines.
Authors (4)
Boyang Wang
Xuweiyi Chen
Matheus Gadelha
Zezhou Cheng
Submitted
May 27, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces a novel image-to-video generation method enabling controllable object entry/exit based on user-specified motion trajectories, inspired by cinematic 'Frame In/Frame Out' techniques. It proposes a new dataset, an efficient identity-preserving Diffusion Transformer architecture, and a comprehensive evaluation protocol.

Business Value

Empowers creators with more intuitive tools for generating dynamic video content from static images, reducing production time and costs for marketing, social media, and entertainment.