Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 90% Match Research Paper Researchers in computer vision and generative models,Video creators,AI engineers in media tech 1 week ago

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models

computer-vision › diffusion-models
📄 Abstract

Abstract: We consider the problem of text-to-video generation tasks with precise control for various applications such as camera movement control and video-to-video editing. Most methods tacking this problem rely on providing user-defined controls, such as binary masks or camera movement embeddings. In our approach we propose OnlyFlow, an approach leveraging the optical flow firstly extracted from an input video to condition the motion of generated videos. Using a text prompt and an input video, OnlyFlow allows the user to generate videos that respect the motion of the input video as well as the text prompt. This is implemented through an optical flow estimation model applied on the input video, which is then fed to a trainable optical flow encoder. The output feature maps are then injected into the text-to-video backbone model. We perform quantitative, qualitative and user preference studies to show that OnlyFlow positively compares to state-of-the-art methods on a wide range of tasks, even though OnlyFlow was not specifically trained for such tasks. OnlyFlow thus constitutes a versatile, lightweight yet efficient method for controlling motion in text-to-video generation. Models and code will be made available on GitHub and HuggingFace.
Authors (4)
Mathis Koroglu
Hugo Caselles-Dupré
Guillaume Jeanneret Sanmiguel
Matthieu Cord
Submitted
November 15, 2024
arXiv Category
cs.CV
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 6225-6235
arXiv PDF

Key Contributions

Introduces OnlyFlow, a novel approach for text-to-video generation that uses optical flow extracted from an input video to condition the motion of the generated output. This allows for precise control over motion, respecting both the text prompt and the input video's dynamics.

Business Value

Enables more sophisticated and controllable video generation tools for creative industries, potentially reducing production costs and time for visual effects, animation, and personalized content.