Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper AI Researchers,Generative Model Developers,ML Engineers,Artists and Designers using AI tools 2 weeks ago

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

generative-ai › diffusion-models
📄 Abstract

Abstract: Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates offline expert demonstrations with online policy-generated negative samples, enabling it to effectively capture human preferences while addressing the limitations of offline data. Comprehensive experiments show that Diffusion-DRO delivers improved generation quality across a range of challenging and unseen prompts, outperforming state-of-the-art baselines in both both quantitative metrics and user studies. Our source code and pre-trained models are available at https://github.com/basiclab/DiffusionDRO.
Authors (4)
Yi-Lun Wu
Bo-Kai Ruan
Chiang Tseng
Hong-Han Shuai
Submitted
October 21, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Diffusion-DRO is a novel preference learning framework for diffusion models that avoids reward models by framing preference learning as a ranking problem solvable via denoising. It overcomes issues with probability estimation and dataset diversity by integrating offline expert demonstrations with online policy samples, offering improved training stability and alignment.

Business Value

Enables the creation of more user-aligned and aesthetically pleasing AI-generated images, improving tools for artists, designers, and content creators, and potentially leading to more personalized visual content.