Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Robust feature representations are essential for learning-based Multi-View
Stereo (MVS), which relies on accurate feature matching. Recent MVS methods
leverage Transformers to capture long-range dependencies based on local
features extracted by conventional feature pyramid networks. However, the
quadratic complexity of Transformer-based MVS methods poses challenges to
balance performance and efficiency. Motivated by the global modeling capability
and linear complexity of the Mamba architecture, we propose MVSMamba, the first
Mamba-based MVS network. MVSMamba enables efficient global feature aggregation
with minimal computational overhead. To fully exploit Mamba's potential in MVS,
we propose a Dynamic Mamba module (DM-module) based on a novel
reference-centered dynamic scanning strategy, which enables: (1) Efficient
intra- and inter-view feature interaction from the reference to source views,
(2) Omnidirectional multi-view feature representations, and (3) Multi-scale
global feature aggregation. Extensive experimental results demonstrate MVSMamba
outperforms state-of-the-art MVS methods on the DTU dataset and the
Tanks-and-Temples benchmark with both superior performance and efficiency. The
source code is available at https://github.com/JianfeiJ/MVSMamba.
Authors (7)
Jianfei Jiang
Qiankun Liu
Hongyuan Liu
Haochen Yu
Liyong Wang
Jiansheng Chen
+1 more
Submitted
November 3, 2025
Key Contributions
MVSMamba is the first Mamba-based MVS network, offering efficient global feature aggregation with linear complexity, overcoming the quadratic complexity of Transformer-based methods. It introduces a Dynamic Mamba module with a reference-centered dynamic scanning strategy for improved intra/inter-view feature interaction, omnidirectional representations, and multi-scale global feature aggregation.
Business Value
Enables more efficient and accurate 3D reconstruction for applications like autonomous driving, robotics, and virtual/augmented reality, potentially reducing computational costs and improving real-time performance.