Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Multimodal Large Languages models have been progressing from uni-modal
understanding toward unifying visual, audio and language modalities,
collectively termed omni models. However, the correlation between uni-modal and
omni-modal remains unclear, which requires comprehensive evaluation to drive
omni model's intelligence evolution. In this work, we propose a novel, high
quality and diversity omni model benchmark, MultiModal All in One Benchmark
(MMAO-Bench), which effectively assesses both uni-modal and omni-modal
understanding capabilities. The benchmark consists of 1880 human curated
samples, across 44 task types, and a innovative multi-step open-ended question
type that better assess complex reasoning tasks. Experimental result shows the
compositional law between cross-modal and uni-modal performance and the
omni-modal capability manifests as a bottleneck effect on weak models, while
exhibiting synergistic promotion on strong models.
Authors (9)
Chen Chen
ZeYang Hu
Fengjiao Chen
Liya Ma
Jiaxing Liu
Xiaoyu Li
+3 more
Submitted
October 21, 2025
Key Contributions
This paper introduces MMAO-Bench, a novel, high-quality benchmark for evaluating Multimodal Large Language Models (Omni Models). It assesses both uni-modal and omni-modal understanding capabilities and reveals the compositional law between them, highlighting how uni-modal performance influences omni-modal capabilities and identifying bottleneck effects.
Business Value
Provides a standardized way to measure and compare the progress of multimodal AI, accelerating development and identifying areas for improvement in creating more capable and versatile AI systems.