arxiv_cv 95% Match Research Paper AI Researchers,Generative Model Developers,RL Engineers,Content Creation Tool Developers 3 weeks ago

Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning

reinforcement-learning › rlhf

📄 Abstract

Abstract: While advanced methods like VACE and Phantom have advanced video generation for specific subjects in diverse scenarios, they struggle with multi-human identity preservation in dynamic interactions, where consistent identities across multiple characters are critical. To address this, we propose Identity-GRPO, a human feedback-driven optimization pipeline for refining multi-human identity-preserving video generation. First, we construct a video reward model trained on a large-scale preference dataset containing human-annotated and synthetic distortion data, with pairwise annotations focused on maintaining human consistency throughout the video. We then employ a GRPO variant tailored for multi-human consistency, which greatly enhances both VACE and Phantom. Through extensive ablation studies, we evaluate the impact of annotation quality and design choices on policy optimization. Experiments show that Identity-GRPO achieves up to 18.9% improvement in human consistency metrics over baseline methods, offering actionable insights for aligning reinforcement learning with personalized video generation.

Authors (6)

Xiangyu Meng

Zixian Zhang

Zhenghao Zhang

Junchao Liao

Long Qin

Weizhi Wang

Submitted

October 16, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Introduces Identity-GRPO, a human feedback-driven optimization pipeline for refining multi-human identity-preserving video generation. It constructs a video reward model trained on human preferences and employs a GRPO variant tailored for multi-human consistency, significantly enhancing existing models like VACE and Phantom.

Business Value

Enables the creation of more believable and engaging multi-character videos for entertainment, gaming, and virtual interactions, improving user experience and content quality.

Paper Metadata

Innovation Type

Training Methodology/Framework

Deployment Feasibility

Moderate. Requires significant data collection for preference datasets and complex RL training pipelines. Integration with existing generative models is key.

Limitations Addressed

Existing methods struggle with multi-human identity preservation in dynamic interactions, where consistent identities across multiple characters are critical. They also lack effective mechanisms to incorporate human preferences for refining generation quality.

Performance Gains

Up to 18.9% improvement in human consistency metrics compared to baseline models (VACE, Phantom).

Technical Tags

multi-human video generationidentity preservationreinforcement learninghuman feedbackreward modelGRPOVACEPhantomhuman consistencypreference dataset

Research Topics

Generative AIVideo GenerationIdentity PreservationReinforcement LearningHuman Feedback

Methods & Architectures

Identity-GRPOhuman feedback-driven optimizationvideo reward modelGRPO variantpreference dataset

Applications & Tasks

Content Creation Digital Media Virtual Environments Gaming Social Media Multi-human identity preservation in dynamic interactionsMaintaining consistent identities across multiple charactersRefining generative models with human preferences Multi-human Video GenerationIdentity-Preserving Video SynthesisReinforcement Learning from Human Feedback

Datasets & Benchmarks

Datasets

large-scale preference dataset

human consistencyidentity preservationvideo quality

Related Fields

Generative AIReinforcement LearningComputer VisionHuman-Computer InteractionDeep Learning

Keywords

multi-humanidentity preservationvideo generationreinforcement learninghuman feedbackRLHFGRPOreward modelgenerative AIconsistency

Academic Context

#Generative AI#Video Generation#Identity Preservation#Reinforcement Learning#Human Feedback

Commercial Potential

Potential Products

Advanced video generation platformsTools for creating realistic virtual charactersAI-powered animation software

Target Industries

GamingFilm and AnimationVirtual RealitySocial MediaAdvertising

Use Case Examples

Generating realistic scenes with multiple interacting characters in a video game.Creating animated sequences for movies with consistent character identities.Developing virtual social spaces with believable avatars.

Competitive Edge

Addresses a critical gap in multi-human video generation by integrating human feedback via RL, offering a more refined and consistent output than methods relying solely on unsupervised or supervised learning.

Market Opportunity

Growing demand for sophisticated AI tools in the creative industries.

Revenue Models

SaaS for video generation serviceslicensing to game studios and animation houses.

Resource Requirements

Compute Needs

High, due to the need for training large generative models, reward models, and performing RL optimization.

Data Requirements

Requires a large-scale preference dataset with human annotations comparing different video generations.

Deployment Constraints

Complexity of the RL training pipeline and the need for high-quality human preference data are significant challenges.

Scalability

Scalability depends on the efficiency of the underlying generative model and the RL training framework.

Production Readiness

Maturity Level

Research

Time to Market

2-3 years

Patent Potential

High, for the Identity-GRPO framework and the specific reward modeling and optimization techniques for multi-human consistency.

View Full Paper Back to Papers