Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Researchers,ML Engineers,Content Creators,Developers of Generative Models 1 week ago

Improving Video Generation with Human Feedback

generative-ai › diffusion
📄 Abstract

Abstract: Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs.
Authors (17)
Jie Liu
Gongye Liu
Jiajun Liang
Ziyang Yuan
Xiaokun Liu
Mingwu Zheng
+11 more
Submitted
January 23, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Develops a systematic pipeline using human feedback to improve video generation, focusing on motion smoothness and prompt alignment. It introduces a large-scale human preference dataset, a multi-dimensional VideoReward model, and three alignment algorithms (Flow-DPO, Flow-RWR, and an inference-time technique) from a unified RL perspective.

Business Value

Enables the creation of higher-quality, more controllable, and human-aligned video content, opening up new possibilities for creative industries and personalized media.