arxiv_cv 92% Match Research Paper AI Researchers,Generative Model Developers,Content Creators,Machine Learning Engineers 1 day ago

Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation

generative-ai › diffusion

📄 Abstract

Abstract: Recent studies have identified Direct Preference Optimization (DPO) as an efficient and reward-free approach to improving video generation quality. However, existing methods largely follow image-domain paradigms and are mainly developed on small-scale models (approximately 2B parameters), limiting their ability to address the unique challenges of video tasks, such as costly data construction, unstable training, and heavy memory consumption. To overcome these limitations, we introduce a GT-Pair that automatically builds high-quality preference pairs by using real videos as positives and model-generated videos as negatives, eliminating the need for any external annotation. We further present Reg-DPO, which incorporates the SFT loss as a regularization term into the DPO objective to enhance training stability and generation fidelity. Additionally, by combining the FSDP framework with multiple memory optimization techniques, our approach achieves nearly three times higher training capacity than using FSDP alone. Extensive experiments on both I2V and T2V tasks across multiple datasets demonstrate that our method consistently outperforms existing approaches, delivering superior video generation quality.

Authors (10)

Jie Du

Xinyu Gong

Qingshan Tan

Wen Li

Yangming Cheng

Weitao Wang

+4 more

Submitted

November 3, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Introduces Reg-DPO, a method that regularizes DPO with an SFT loss and uses automatically generated GT-Pairs for high-quality video generation. This approach addresses challenges in training large video models, improving stability and fidelity while reducing data annotation costs and memory consumption.

Business Value

Enables more efficient and effective creation of high-quality video content, accelerating production pipelines in media and entertainment industries.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Moderate to High, depending on the scale of the model and available compute resources. FSDP and memory optimization aid feasibility.

Limitations Addressed

Addresses limitations of existing DPO methods in video generation, including costly data construction, unstable training, and high memory usage, particularly for large-scale models.

Performance Gains

Achieves significantly higher training throughput (nearly 3x) and improved generation quality compared to baseline methods.

Technical Tags

video generationdirect preference optimization (DPO)supervised fine-tuning (SFT)preference learninggenerative modelslarge modelsFSDPmemory optimization

Research Topics

Generative AIVideo SynthesisReinforcement Learning from Human Feedback (RLHF)Model Training EfficiencyLarge Model Training

Methods & Architectures

Direct Preference Optimization (DPO)Supervised Fine-Tuning (SFT) regularizationGT-Pair generationFSDP (Fully Sharded Data Parallel)Memory optimization techniques Diffusion ModelsTransformer

Applications & Tasks

Content Creation Media Production Gaming Virtual Reality Costly data construction for video generationUnstable training of large modelsHeavy memory consumptionLimitations of image-domain DPO methods High-quality video generationImproving generation fidelityStabilizing training of large video models

Related Fields

Generative AIDeep LearningComputer VisionReinforcement LearningLarge Language Models

Keywords

video generationDPOpreference learninggenerative modelsdiffusion modelslarge modelstraining efficiencySFTcontent creationAIdeep learningFSDP

Academic Context

#Generative AI#Video Synthesis#Reinforcement Learning from Human Feedback (RLHF)#Model Training Efficiency#Large Model Training

Technology Stack

Frameworks & Libraries

PyTorchFSDP

Programming Languages

Python

ML Infrastructure

Distributed training frameworks

Commercial Potential

Potential Products

AI video generation platformsTools for automated video editing and creation

Target Industries

Media and EntertainmentAdvertisingGamingEducation

Use Case Examples

Generating short promotional videosCreating animated sequences for gamesSynthesizing realistic video clips for training other models

Competitive Edge

Offers a more stable and efficient approach to video generation using DPO, particularly for large-scale models, by incorporating SFT regularization and automated preference pair generation.

Market Opportunity

Rapidly growing market for generative AI and synthetic media.

Revenue Models

Licensing of the generation technologyAPI accesscloud-based generation services.

Resource Requirements

Compute Needs

Significant GPU resources required for training large video models, mitigated by FSDP and memory optimization.

Data Requirements

Large datasets of videos for training and preference data (automatically generated).

Deployment Constraints

Requires substantial computational resources for inference.

Scalability

Designed for large-scale models and benefits from distributed training frameworks like FSDP.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for widespread adoption in production systems.

Patent Potential

Moderate, for the Reg-DPO method and GT-Pair generation strategy.

View Full Paper Back to Papers