Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Multimodal learning systems often encounter challenges related to modality
imbalance, where a dominant modality may overshadow others, thereby hindering
the learning of weak modalities. Conventional approaches often force weak
modalities to align with dominant ones in "Learning to be (the same)" (Positive
Learning), which risks suppressing the unique information inherent in the weak
modalities. To address this challenge, we offer a new learning paradigm:
"Learning Not to be" (Negative Learning). Instead of enhancing weak modalities'
target-class predictions, the dominant modalities dynamically guide the weak
modality to suppress non-target classes. This stabilizes the decision space and
preserves modality-specific information, allowing weak modalities to preserve
unique information without being over-aligned. We proceed to reveal multimodal
learning from a robustness perspective and theoretically derive the Multimodal
Negative Learning (MNL) framework, which introduces a dynamic guidance
mechanism tailored for negative learning. Our method provably tightens the
robustness lower bound of multimodal learning by increasing the Unimodal
Confidence Margin (UCoM) and reduces the empirical error of weak modalities,
particularly under noisy and imbalanced scenarios. Extensive experiments across
multiple benchmarks demonstrate the effectiveness and generalizability of our
approach against competing methods. The code will be available at
https://github.com/BaoquanGong/Multimodal-Negative-Learning.git.
Authors (5)
Baoquan Gong
Xiyuan Gao
Pengfei Zhu
Qinghua Hu
Bing Cao
Submitted
October 23, 2025
Key Contributions
Introduces 'Negative Learning' ('Learning Not to be') as a novel paradigm for multimodal learning, contrasting with traditional 'Positive Learning'. MNL guides weak modalities to suppress non-target classes based on dominant modalities, stabilizing the decision space and preserving unique information, thereby addressing modality imbalance and improving robustness.
Business Value
Enables the development of more robust and versatile multimodal AI systems that can effectively leverage information from diverse data sources, even when some sources are less informative or noisy. This is critical for complex real-world applications.