Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research paper AI researchers,Machine learning engineers,Developers of AI alignment techniques 1 week ago

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

large-language-models › multimodal-llms
📄 Abstract

Abstract: Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.
Authors (8)
Zhuoran Jin
Hongbang Yuan
Kejian Zhu
Jiachun Li
Pengfei Cao
Yubo Chen
+2 more
Submitted
October 27, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces Omni-Reward, a framework for generalist omni-modal reward modeling that addresses modality imbalance and preference rigidity by supporting free-form preferences. It includes Omni-RewardBench, the first omni-modal benchmark with free-form preferences, and Omni-RewardData, a large multimodal preference dataset.

Business Value

Enables the development of more versatile and human-aligned AI systems that can understand and respond to preferences across a wide range of modalities, leading to more natural and effective human-AI collaboration.