Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Controllability, temporal coherence, and detail synthesis remain the most
critical challenges in video generation. In this paper, we focus on a commonly
used yet underexplored cinematic technique known as Frame In and Frame Out.
Specifically, starting from image-to-video generation, users can control the
objects in the image to naturally leave the scene or provide breaking new
identity references to enter the scene, guided by a user-specified motion
trajectory. To support this task, we introduce a new dataset that is curated
semi-automatically, an efficient identity-preserving motion-controllable video
Diffusion Transformer architecture, and a comprehensive evaluation protocol
targeting this task. Our evaluation shows that our proposed approach
significantly outperforms existing baselines.
Authors (4)
Boyang Wang
Xuweiyi Chen
Matheus Gadelha
Zezhou Cheng
Key Contributions
Introduces a novel image-to-video generation method enabling controllable object entry/exit based on user-specified motion trajectories, inspired by cinematic 'Frame In/Frame Out' techniques. It proposes a new dataset, an efficient identity-preserving Diffusion Transformer architecture, and a comprehensive evaluation protocol.
Business Value
Empowers creators with more intuitive tools for generating dynamic video content from static images, reducing production time and costs for marketing, social media, and entertainment.