arxiv_cv 98% Match Research Paper Researchers in Generative AI,Developers using Diffusion Models,Computer Vision Scientists 1 month ago

Rectified Diffusion Guidance for Conditional Generation

generative-ai › diffusion

📄 Abstract

Abstract: Classifier-Free Guidance (CFG), which combines the conditional and unconditional score functions with two coefficients summing to one, serves as a practical technique for diffusion model sampling. Theoretically, however, denoising with CFG \textit{cannot} be expressed as a reciprocal diffusion process, which may consequently leave some hidden risks during use. In this work, we revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (\textit{i.e.}, the widely used summing-to-one version) brings about expectation shift of the generative distribution. To rectify this issue, we propose ReCFG with a relaxation on the guidance coefficients such that denoising with \method strictly aligns with the diffusion theory. We further show that our approach enjoys a \textbf{\textit{closed-form}} solution given the guidance strength. That way, the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected. Empirical evidence on real-world data demonstrate the compatibility of our post-hoc design with existing state-of-the-art diffusion models, including both class-conditioned ones (\textit{e.g.}, EDM2 on ImageNet) and text-conditioned ones (\textit{e.g.}, SD3 on CC12M), without any retraining. Code is available at \href{https://github.com/thuxmf/recfg}{https://github.com/thuxmf/recfg}.

Key Contributions

Proposes Rectified Classifier-Free Guidance (ReCFG) for diffusion models, which corrects the theoretical misalignment of standard CFG by relaxing guidance coefficients. This ensures denoising strictly aligns with diffusion theory and offers a closed-form solution for coefficients.

Business Value

Leads to more theoretically sound and potentially more controllable conditional generation from diffusion models, improving the reliability and quality of generated content for various applications.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

High. ReCFG is presented as a modification to CFG, suggesting it can be readily integrated into existing diffusion model pipelines.

Limitations Addressed

Denoising with CFG not being a reciprocal diffusion process,Expectation shift of the generative distribution caused by summing-to-one coefficients,Hidden risks during CFG use

Technical Tags

classifier-free guidance (CFG)diffusion modelsconditional generationrectified CFG (ReCFG)diffusion theoryexpectation shiftguidance coefficientsclosed-form solutiongenerative distributionsampling process

Research Topics

Generative ModelsDiffusion ModelsConditional GenerationSampling TechniquesTheoretical Analysis

Methods & Architectures

Rectified Classifier-Free Guidance (ReCFG)Relaxation of guidance coefficientsPre-computation of coefficients Diffusion Models

Applications & Tasks

Image Generation Conditional Synthesis Computer Vision Theoretical limitations of standard CFGExpectation shift in generative distribution with CFGImproper configuration of guidance coefficients Conditional Generation with Diffusion ModelsImproving the theoretical grounding of CFG

Related Fields

Generative AIDeep LearningComputer VisionProbability Theory

Keywords

Diffusion ModelsClassifier-Free GuidanceCFGConditional GenerationGenerative AISamplingTheoryRectified CFGReCFGImage Generation

Academic Context

#Generative Models#Diffusion Models#Conditional Generation#Sampling Techniques#Theoretical Analysis

Commercial Potential

Potential Products

More stable and predictable generative modelsImproved tools for controllable image synthesisLibraries for advanced diffusion model sampling

Target Industries

Media and EntertainmentGamingAdvertisingDesign

Use Case Examples

Generating images with specific attributes more reliablyImproving the fidelity of text-to-image generationControlling the style and content of generated outputs

Competitive Edge

Offers a theoretically grounded improvement over standard Classifier-Free Guidance, addressing fundamental issues in the diffusion model sampling process.

Resource Requirements

Compute Needs

Minimal additional compute required for applying ReCFG compared to standard CFG.

Data Requirements

Requires datasets used for training diffusion models.

Deployment Constraints

The theoretical benefits need to translate into practical improvements across diverse generation tasks.

Scalability

The method is designed to be a modification of CFG, implying good scalability with existing diffusion model frameworks.

View Full Paper Back to Papers