Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present a method for augmenting real-world videos with newly generated
dynamic content. Given an input video and a simple user-provided text
instruction describing the desired content, our method synthesizes dynamic
objects or complex scene effects that naturally interact with the existing
scene over time. The position, appearance, and motion of the new content are
seamlessly integrated into the original footage while accounting for camera
motion, occlusions, and interactions with other dynamic objects in the scene,
resulting in a cohesive and realistic output video. We achieve this via a
zero-shot, training-free framework that harnesses a pre-trained text-to-video
diffusion transformer to synthesize the new content and a pre-trained
vision-language model to envision the augmented scene in detail. Specifically,
we introduce a novel inference-based method that manipulates features within
the attention mechanism, enabling accurate localization and seamless
integration of the new content while preserving the integrity of the original
scene. Our method is fully automated, requiring only a simple user instruction.
We demonstrate its effectiveness on a wide range of edits applied to real-world
videos, encompassing diverse objects and scenarios involving both camera and
object motion.
Authors (4)
Danah Yatim
Rafail Fridman
Omer Bar-Tal
Tali Dekel
Submitted
February 5, 2025
Key Contributions
Presents a method for augmenting real-world videos with newly generated dynamic content based on text instructions. It achieves seamless integration of synthesized objects/effects, accounting for scene dynamics, camera motion, and occlusions, using a zero-shot, training-free framework that manipulates pre-trained diffusion and vision-language models.
Business Value
Empowers creators to easily add complex dynamic elements to videos, significantly speeding up post-production workflows and enabling novel visual storytelling possibilities for marketing, entertainment, and social media content.