arxiv_ai 95% Match Research paper AI researchers,Machine learning engineers,Developers of AI alignment techniques 1 week ago

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

large-language-models › multimodal-llms

📄 Abstract

Abstract: Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.

Authors (8)

Zhuoran Jin

Hongbang Yuan

Kejian Zhu

Jiachun Li

Pengfei Cao

Yubo Chen

+2 more

Submitted

October 27, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces Omni-Reward, a framework for generalist omni-modal reward modeling that addresses modality imbalance and preference rigidity by supporting free-form preferences. It includes Omni-RewardBench, the first omni-modal benchmark with free-form preferences, and Omni-RewardData, a large multimodal preference dataset.

Business Value

Enables the development of more versatile and human-aligned AI systems that can understand and respond to preferences across a wide range of modalities, leading to more natural and effective human-AI collaboration.

Paper Metadata

Innovation Type

Dataset and Benchmark creation, Framework development

Deployment Feasibility

High, as it provides tools (benchmark, dataset) for developing and evaluating such models, facilitating their adoption.

Limitations Addressed

Addresses the limitations of existing reward models that are primarily text/image-focused (modality imbalance) and rely on fixed binary preferences (preference rigidity).

Technical Tags

Reward modelingOmni-modalMultimodal AIFree-form preferencesOmni-modal benchmarkPreference datasetGeneralist modelsTextImageVideoAudio3D

Research Topics

AI AlignmentMultimodal LearningHuman Preference ModelingGeneralist AI Agents

Methods & Architectures

Omni-RewardBench (benchmark)Omni-RewardData (dataset)Omni-Reward (model/framework) Generalist omni-modal reward models

Applications & Tasks

AI Alignment Multimodal AI Human-AI Interaction Modality imbalance in reward modelsPreference rigidity in reward modelsLack of omni-modal benchmarks Training generalist omni-modal reward modelsEvaluating reward models across multiple modalitiesCapturing diverse human preferences

Datasets & Benchmarks

Datasets

Omni-RewardBench, Omni-RewardData

Benchmarks

Omni-RewardBench

Related Fields

Natural Language ProcessingComputer VisionSpeech ProcessingReinforcement LearningHuman-Computer Interaction

Keywords

Reward modelingOmni-modalMultimodal AIAI alignmentHuman preferencesFree-form preferencesBenchmarkDatasetTextImageVideoAudio3DGeneralist models

Academic Context

#AI Alignment#Multimodal Learning#Human Preference Modeling#Generalist AI Agents

Commercial Potential

Potential Products

AI assistants with broader modality understandingContent generation toolsPersonalized AI experiences

Target Industries

TechnologyMedia and EntertainmentEducationGaming

Use Case Examples

AI that can generate video descriptions based on user's spoken feedbackAI that can create 3D models based on multimodal instructionsPersonalized AI tutors that adapt to user's preferred communication style

Competitive Edge

Pioneers omni-modal reward modeling with free-form preferences, offering a more comprehensive and flexible approach compared to existing modality-specific or rigidly-defined preference models.

Market Opportunity

Significant market potential as AI systems become more integrated into daily life and require nuanced alignment.

Revenue Models

Licensing of the Omni-Reward frameworkservices for custom reward model development.

Resource Requirements

Compute Needs

High, for training large omni-modal reward models and processing large datasets.

Data Requirements

Large-scale, diverse multimodal preference data (Omni-RewardData).

Deployment Constraints

Requires significant computational resources for training and inference; potential challenges in collecting diverse and high-quality free-form preference data.

Scalability

The framework is designed for scalability, with the benchmark and dataset supporting the development of increasingly capable generalist models.

Regulatory Considerations

Ethical considerations regarding data privacy and the potential for misuse of powerful alignment models.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for widespread adoption in advanced AI systems.

View Full Paper Back to Papers