Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 80% Match Research Paper AI Researchers,ML Engineers,Robotics Engineers,Data Fusion Specialists 2 weeks ago

SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment

graph-neural-networks › graph-learning
📄 Abstract

Abstract: Conventional multimodal alignment methods assume mutual redundancy across all modalities, an assumption that fails in real-world distributed scenarios. We propose SheafAlign, a sheaf-theoretic framework for decentralized multimodal alignment that replaces single-space alignment with multiple comparison spaces. This approach models pairwise modality relations through sheaf structures and leverages decentralized contrastive learning-based objectives for training. SheafAlign overcomes the limitations of prior methods by not requiring mutual redundancy among all modalities, preserving both shared and unique information. Experiments on multimodal sensing datasets show superior zero-shot generalization, cross-modal alignment, and robustness to missing modalities, with 50\% lower communication cost than state-of-the-art baselines.
Authors (4)
Abdulmomen Ghalkha
Zhuojun Tian
Chaouki Ben Issaid
Mehdi Bennis
Submitted
October 23, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

SheafAlign is a novel sheaf-theoretic framework for decentralized multimodal alignment that replaces single-space alignment with multiple comparison spaces. It models pairwise modality relations using sheaf structures and employs decentralized contrastive learning. This approach overcomes the limitation of requiring mutual redundancy across all modalities, preserves both shared and unique information, and achieves superior zero-shot generalization and robustness to missing modalities with significantly lower communication costs.

Business Value

Enables more efficient and robust multimodal data fusion in decentralized systems, such as sensor networks or collaborative robotics. This is crucial for applications where data is distributed and modalities may be incomplete or unreliable.