Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Computer Vision Researchers,3D Graphics Engineers,Robotics Engineers 3 days ago

MVSMamba: Multi-View Stereo with State Space Model

computer-vision › 3d-vision
📄 Abstract

Abstract: Robust feature representations are essential for learning-based Multi-View Stereo (MVS), which relies on accurate feature matching. Recent MVS methods leverage Transformers to capture long-range dependencies based on local features extracted by conventional feature pyramid networks. However, the quadratic complexity of Transformer-based MVS methods poses challenges to balance performance and efficiency. Motivated by the global modeling capability and linear complexity of the Mamba architecture, we propose MVSMamba, the first Mamba-based MVS network. MVSMamba enables efficient global feature aggregation with minimal computational overhead. To fully exploit Mamba's potential in MVS, we propose a Dynamic Mamba module (DM-module) based on a novel reference-centered dynamic scanning strategy, which enables: (1) Efficient intra- and inter-view feature interaction from the reference to source views, (2) Omnidirectional multi-view feature representations, and (3) Multi-scale global feature aggregation. Extensive experimental results demonstrate MVSMamba outperforms state-of-the-art MVS methods on the DTU dataset and the Tanks-and-Temples benchmark with both superior performance and efficiency. The source code is available at https://github.com/JianfeiJ/MVSMamba.
Authors (7)
Jianfei Jiang
Qiankun Liu
Hongyuan Liu
Haochen Yu
Liyong Wang
Jiansheng Chen
+1 more
Submitted
November 3, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

MVSMamba is the first Mamba-based MVS network, offering efficient global feature aggregation with linear complexity, overcoming the quadratic complexity of Transformer-based methods. It introduces a Dynamic Mamba module with a reference-centered dynamic scanning strategy for improved intra/inter-view feature interaction, omnidirectional representations, and multi-scale global feature aggregation.

Business Value

Enables more efficient and accurate 3D reconstruction for applications like autonomous driving, robotics, and virtual/augmented reality, potentially reducing computational costs and improving real-time performance.