Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Direct preference optimization (DPO) methods have shown strong potential in
aligning text-to-image diffusion models with human preferences by training on
paired comparisons. These methods improve training stability by avoiding the
REINFORCE algorithm but still struggle with challenges such as accurately
estimating image probabilities due to the non-linear nature of the sigmoid
function and the limited diversity of offline datasets. In this paper, we
introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new
preference learning framework grounded in inverse reinforcement learning.
Diffusion-DRO removes the dependency on a reward model by casting preference
learning as a ranking problem, thereby simplifying the training objective into
a denoising formulation and overcoming the non-linear estimation issues found
in prior methods. Moreover, Diffusion-DRO uniquely integrates offline expert
demonstrations with online policy-generated negative samples, enabling it to
effectively capture human preferences while addressing the limitations of
offline data. Comprehensive experiments show that Diffusion-DRO delivers
improved generation quality across a range of challenging and unseen prompts,
outperforming state-of-the-art baselines in both both quantitative metrics and
user studies. Our source code and pre-trained models are available at
https://github.com/basiclab/DiffusionDRO.
Authors (4)
Yi-Lun Wu
Bo-Kai Ruan
Chiang Tseng
Hong-Han Shuai
Submitted
October 21, 2025
Key Contributions
Diffusion-DRO is a novel preference learning framework for diffusion models that avoids reward models by framing preference learning as a ranking problem solvable via denoising. It overcomes issues with probability estimation and dataset diversity by integrating offline expert demonstrations with online policy samples, offering improved training stability and alignment.
Business Value
Enables the creation of more user-aligned and aesthetically pleasing AI-generated images, improving tools for artists, designers, and content creators, and potentially leading to more personalized visual content.