Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We propose Ming-Flash-Omni, an upgraded version of Ming-Omni, built upon a
sparser Mixture-of-Experts (MoE) variant of Ling-Flash-2.0 with 100 billion
total parameters, of which only 6.1 billion are active per token. This
architecture enables highly efficient scaling (dramatically improving
computational efficiency while significantly expanding model capacity) and
empowers stronger unified multimodal intelligence across vision, speech, and
language, representing a key step toward Artificial General Intelligence (AGI).
Compared to its predecessor, the upgraded version exhibits substantial
improvements across multimodal understanding and generation. We significantly
advance speech recognition capabilities, achieving state-of-the-art performance
in contextual ASR and highly competitive results in dialect-aware ASR. In image
generation, Ming-Flash-Omni introduces high-fidelity text rendering and
demonstrates marked gains in scene consistency and identity preservation during
image editing. Furthermore, Ming-Flash-Omni introduces generative segmentation,
a capability that not only achieves strong standalone segmentation performance
but also enhances spatial control in image generation and improves editing
consistency. Notably, Ming-Flash-Omni achieves state-of-the-art results in
text-to-image generation and generative segmentation, and sets new records on
all 12 contextual ASR benchmarks, all within a single unified architecture.
Authors (58)
Inclusion AI
:
Bowen Ma
Cheng Zou
Canxiang Yan
Chunxiang Jin
+52 more
Submitted
October 28, 2025
Key Contributions
Ming-Flash-Omni presents a sparse Mixture-of-Experts (MoE) architecture with 100B parameters (6.1B active per token), achieving highly efficient scaling and unified multimodal intelligence across vision, speech, and language. It demonstrates state-of-the-art performance in contextual ASR and significantly advances image generation quality, marking a step towards AGI.
Business Value
Paves the way for more powerful and efficient AI systems capable of understanding and generating content across multiple modalities, accelerating progress towards AGI and enabling new applications in various industries.