arxiv_ai 95% Match Research Paper AI Researchers,ML Engineers,Content Creators,Developers of Generative Models 1 week ago

Improving Video Generation with Human Feedback

generative-ai › diffusion

📄 Abstract

Abstract: Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs.

Authors (17)

Jie Liu

Gongye Liu

Jiajun Liang

Ziyang Yuan

Xiaokun Liu

Mingwu Zheng

+11 more

Submitted

January 23, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Develops a systematic pipeline using human feedback to improve video generation, focusing on motion smoothness and prompt alignment. It introduces a large-scale human preference dataset, a multi-dimensional VideoReward model, and three alignment algorithms (Flow-DPO, Flow-RWR, and an inference-time technique) from a unified RL perspective.

Business Value

Enables the creation of higher-quality, more controllable, and human-aligned video content, opening up new possibilities for creative industries and personalized media.

Paper Metadata

Innovation Type

Pipeline / Methodology

Deployment Feasibility

Moderate. Requires infrastructure for collecting human feedback, training reward models, and fine-tuning generative models. Integration into existing video generation platforms is possible.

Limitations Addressed

Issues like unsmooth motion and misalignment in video generation,Difficulty in capturing nuanced human preferences for video quality,Lack of effective methods to incorporate human feedback into flow-based video models

Technical Tags

Video GenerationHuman FeedbackReinforcement LearningRectified FlowPrompt AlignmentMotion SmoothnessVideo Reward ModelDirect Preference Optimization (DPO)Reward Weighted Regression (RWR)Inference-time Techniques

Research Topics

Generative ModelsVideo SynthesisHuman-in-the-Loop AIReinforcement Learning from Human Feedback (RLHF)AI Alignment

Methods & Architectures

Human preference dataset constructionVideoReward modelReinforcement learningDirect Preference Optimization (Flow-DPO)Reward Weighted Regression (Flow-RWR) Flow-based models

Applications & Tasks

Multimedia Content Creation Artificial Intelligence Research Unsmooth motion in video generationMisalignment between videos and promptsImproving video quality and coherenceLeveraging human feedback effectively Refining video generation modelsMaximizing video quality and prompt alignmentGenerating coherent and smooth videos

Datasets & Benchmarks

Datasets

Large-scale human preference dataset

Related Fields

Generative AIComputer VisionReinforcement LearningHuman-Computer InteractionNatural Language Processing

Keywords

video generationhuman feedbackreinforcement learningrectified flowalignmentprompt alignmentmotion smoothnessreward modelDPORWRgenerative modelscontent creationmultimedia

Academic Context

#Generative Models#Video Synthesis#Human-in-the-Loop AI#Reinforcement Learning from Human Feedback (RLHF)#AI Alignment

Technology Stack

Frameworks & Libraries

Flow-based models

Commercial Potential

Potential Products

Advanced video generation platformAI tool for video editing and enhancementPersonalized video content creation service

Target Industries

Media and EntertainmentAdvertisingGamingEducationMarketing

Use Case Examples

Generating realistic and coherent video clips from text descriptions.Creating personalized video advertisements tailored to individual users.

Competitive Edge

Advances video generation by systematically incorporating human preferences via RLHF techniques, leading to improved quality and alignment compared to models trained solely on objective metrics.

Market Opportunity

Very large, driven by the exponential growth in video content consumption and creation.

Revenue Models

SaaS platform for video generationAPI accesslicensing of the technology.

Resource Requirements

Compute Needs

High, for training reward models and fine-tuning generative models.

Data Requirements

Large-scale human preference dataset for video quality and alignment.

Deployment Constraints

Scalability of human feedback collection,Computational cost of RLHF training,Ensuring diversity and quality of human annotators

Scalability

Scalability depends on the efficiency of the reward model training and the generative model fine-tuning process.

Regulatory Considerations

Ethical considerations regarding AI-generated contentpotential for misuse.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years, for a production-ready system.

Patent Potential

High, for the VideoReward model, alignment algorithms, and the overall pipeline.

View Full Paper Back to Papers