arxiv_ai 95% Match Research Paper AI Researchers,Generative Model Developers,ML Engineers,Computer Vision Researchers 1 week ago

Flow-GRPO: Training Flow Matching Models via Online RL

generative-ai › flow-models

📄 Abstract

Abstract: We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps, significantly improving sampling efficiency without sacrificing performance. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For compositional generation, RL-tuned SD3.5-M generates nearly perfect object counts, spatial relations, and fine-grained attributes, increasing GenEval accuracy from $63\%$ to $95\%$. In visual text rendering, accuracy improves from $59\%$ to $92\%$, greatly enhancing text generation. Flow-GRPO also achieves substantial gains in human preference alignment. Notably, very little reward hacking occurred, meaning rewards did not increase at the cost of appreciable image quality or diversity degradation.

Authors (9)

Jie Liu

Gongye Liu

Jiajun Liang

Yangguang Li

Jiaheng Liu

Xintao Wang

+3 more

Submitted

May 8, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Flow-GRPO is the first method to integrate online policy gradient RL into flow matching models. It uses an ODE-to-SDE conversion for RL exploration and a denoising reduction strategy for sampling efficiency, significantly improving generation quality and control, especially for compositional tasks.

Business Value

Enables the creation of more controllable and higher-fidelity generative models for applications like graphic design, advertising, and personalized content creation.

Paper Metadata

Innovation Type

Algorithmic Integration

Deployment Feasibility

Moderate, requires integration of RL training pipelines with flow matching models, but the efficiency gains are significant.

Limitations Addressed

Difficulty in applying RL to flow matching models,Low sampling efficiency in diffusion/flow models,Lack of fine-grained control in text-to-image generation

Performance Gains

GenEval accuracy increased from 63% to 95% for compositional generation.,Visual text rendering accuracy improved from 59% to 92%.

Technical Tags

flow matchingreinforcement learningonline policy gradientODE-to-SDE conversiondenoising reductionsampling efficiencycompositional generationtext-to-imageSD3.5-MGenEval

Research Topics

Generative ModelsFlow MatchingReinforcement LearningDiffusion ModelsImage GenerationModel Efficiency

Methods & Architectures

Flow-GRPOOnline Policy Gradient RLODE-to-SDE ConversionDenoising Reduction Strategy Flow Matching ModelsSD3.5-M

Applications & Tasks

Computer Vision Generative AI Image Synthesis Natural Language Processing Improving generative model performanceEnhancing sampling efficiencyAchieving better compositional control in generationTraining flow matching models with RL Text-to-Image GenerationCompositional GenerationVisual Text Rendering

Datasets & Benchmarks

Benchmarks

GenEval accuracy (compositional generation): 63% to 95% • Visual text rendering accuracy: 59% to 92%

GenEval accuracyAccuracy (visual text rendering)

Related Fields

Generative AIMachine LearningReinforcement LearningComputer VisionDiffusion Models

Keywords

flow matchingreinforcement learninggenerative modelstext-to-imagecompositional generationODE-SDEsampling efficiencydiffusion modelspolicy gradientSD3.5GenEval

Academic Context

#Generative Models#Flow Matching#Reinforcement Learning#Diffusion Models#Image Generation#Model Efficiency

Commercial Potential

Potential Products

Advanced text-to-image generation toolsCreative AI platformsTools for generating marketing assets

Target Industries

Media and EntertainmentAdvertisingDesignGaming

Use Case Examples

Generating images with precise control over object counts, spatial relations, and attributes.Creating realistic visual representations of text.Improving the quality and controllability of generative models.

Competitive Edge

First method to combine online RL with flow matching, achieving state-of-the-art results in compositional generation and visual text rendering, outperforming previous approaches.

Market Opportunity

Massive and rapidly growing market for generative AI, particularly in image synthesis.

Revenue Models

Licensing of models/algorithmsAPI accessintegration into creative software suites.

Resource Requirements

Compute Needs

High (requires significant compute for training RL agents and generative models)

Data Requirements

Large-scale image-text datasets, potentially specialized datasets for compositional tasks.

Deployment Constraints

Training complexity involving RL can be challenging; inference efficiency is improved but still requires substantial resources.

Scalability

Scales with model size and training data. The denoising reduction strategy improves inference efficiency.

Production Readiness

Maturity Level

Research Prototype

Time to Market

2-3 years for robust commercial applications

Patent Potential

Moderate (novel integration of RL and flow matching)

View Full Paper Back to Papers