Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: With growing demand for interpretability in deep learning, especially in high
stakes domains, Concept Bottleneck Models (CBMs) address this by inserting
human understandable concepts into the prediction pipeline, but they are
generally single modal and ignore structured concept relationships. To overcome
these limitations, we present MoE-SGT, a reasoning driven framework that
augments CBMs with a structure injecting Graph Transformer and a Mixture of
Experts (MoE) module. We construct answer-concept and answer-question graphs
for multimodal inputs to explicitly model the structured relationships among
concepts. Subsequently, we integrate Graph Transformer to capture multi level
dependencies, addressing the limitations of traditional Concept Bottleneck
Models in modeling concept interactions. However, it still encounters
bottlenecks in adapting to complex concept patterns. Therefore, we replace the
feed forward layers with a Mixture of Experts (MoE) module, enabling the model
to have greater capacity in learning diverse concept relationships while
dynamically allocating reasoning tasks to different sub experts, thereby
significantly enhancing the model's adaptability to complex concept reasoning.
MoE-SGT achieves higher accuracy than other concept bottleneck networks on
multiple datasets by modeling structured relationships among concepts and
utilizing a dynamic expert selection mechanism.
Key Contributions
This paper presents MoE-SGT, a reasoning-driven framework that enhances Concept Bottleneck Models (CBMs) by integrating a Graph Transformer and a Mixture of Experts (MoE) module. It explicitly models structured relationships among concepts using answer-concept and answer-question graphs for multimodal inputs, capturing multi-level dependencies and adapting to complex concept patterns, thereby improving interpretability and performance.
Business Value
Enables more transparent and reliable AI decision-making in complex multimodal scenarios, crucial for high-stakes industries where understanding the 'why' behind a prediction is as important as the prediction itself.