Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Autoregressive (AR) models have emerged as powerful tools for image
generation by modeling images as sequences of discrete tokens. While
Classifier-Free Guidance (CFG) has been adopted to improve conditional
generation, its application in AR models faces two key issues: guidance
diminishing, where the conditional-unconditional gap quickly vanishes as
decoding progresses, and over-guidance, where strong conditions distort visual
coherence. To address these challenges, we propose SoftCFG, an
uncertainty-guided inference method that distributes adaptive perturbations
across all tokens in the sequence. The key idea behind SoftCFG is to let each
generated token contribute certainty-weighted guidance, ensuring that the
signal persists across steps while resolving conflicts between text guidance
and visual context. To further stabilize long-sequence generation, we introduce
Step Normalization, which bounds cumulative perturbations of SoftCFG. Our
method is training-free, model-agnostic, and seamlessly integrates with
existing AR pipelines. Experiments show that SoftCFG significantly improves
image quality over standard CFG and achieves state-of-the-art FID on ImageNet
256*256 among autoregressive models.
Key Contributions
This paper introduces SoftCFG, an uncertainty-guided inference method for autoregressive models that stabilizes guidance during generation. It addresses guidance diminishing and over-guidance by distributing adaptive, certainty-weighted perturbations across tokens and introduces Step Normalization to manage cumulative perturbations, leading to more coherent long-sequence generation.
Business Value
Improves the controllability and quality of images generated by autoregressive models, making them more reliable for creative applications and reducing artifacts caused by unstable guidance.