arxiv_cv 90% Match Research Paper Computer Vision Researchers,Robotics Engineers,3D Reconstruction Specialists,Machine Learning Engineers 3 weeks ago

MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching

computer-vision › 3d-vision

📄 Abstract

Abstract: Cross-view matching is fundamentally achieved through cross-attention mechanisms. However, matching of high-resolution images remains challenging due to the quadratic complexity and lack of explicit matching constraints in the existing cross-attention. This paper proposes an attention mechanism, MatchAttention, that dynamically matches relative positions. The relative position determines the attention sampling center of the key-value pairs given a query. Continuous and differentiable sliding-window attention sampling is achieved by the proposed BilinearSoftmax. The relative positions are iteratively updated through residual connections across layers by embedding them into the feature channels. Since the relative position is exactly the learning target for cross-view matching, an efficient hierarchical cross-view decoder, MatchDecoder, is designed with MatchAttention as its core component. To handle cross-view occlusions, gated cross-MatchAttention and a consistency-constrained loss are proposed. These two components collectively mitigate the impact of occlusions in both forward and backward passes, allowing the model to focus more on learning matching relationships. When applied to stereo matching, MatchStereo-B ranked 1st in average error on the public Middlebury benchmark and requires only 29ms for KITTI-resolution inference. MatchStereo-T can process 4K UHD images in 0.1 seconds using only 3GB of GPU memory. The proposed models also achieve state-of-the-art performance on KITTI 2012, KITTI 2015, ETH3D, and Spring flow datasets. The combination of high accuracy and low computational complexity makes real-time, high-resolution, and high-accuracy cross-view matching possible. Code is available at https://github.com/TingmanYan/MatchAttention.

Authors (5)

Tingman Yan

Tao Liu

Xilian Yang

Qunfei Zhao

Zeyang Xia

Submitted

October 16, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Proposes MatchAttention, a novel attention mechanism for high-resolution cross-view matching that dynamically matches relative positions, overcoming the quadratic complexity of standard cross-attention. It introduces BilinearSoftmax for sampling and a hierarchical MatchDecoder, along with techniques to handle occlusions.

Business Value

Enables more accurate and efficient 3D reconstruction and scene understanding from high-resolution imagery, crucial for applications like autonomous navigation, robotics, and detailed digital twins.

Paper Metadata

Innovation Type

Algorithmic/Architecture

Deployment Feasibility

Moderate. The proposed attention mechanism and decoder are designed to be integrated into existing vision pipelines. Computational efficiency is a key design goal.

Limitations Addressed

The quadratic complexity and lack of explicit matching constraints in existing cross-attention mechanisms hinder high-resolution cross-view matching. Cross-view occlusions are also a challenge.

Performance Gains

Improved accuracy and efficiency in high-resolution cross-view matching compared to standard attention mechanisms.

Technical Tags

cross-view matchinghigh-resolution imagescross-attentionrelative positionsBilinearSoftmaxhierarchical decoderocclusionsconsistency constraintMatchAttentionMatchDecoder

Research Topics

Computer VisionImage Matching3D ReconstructionAttention MechanismsHigh-Resolution Image Processing

Methods & Architectures

MatchAttentionBilinearSoftmaxhierarchical cross-view decoder (MatchDecoder)gated cross-MatchAttentionconsistency-constrained loss hierarchical decoderattention mechanism

Applications & Tasks

3D Reconstruction Simultaneous Localization and Mapping (SLAM) Robotics Autonomous Driving Photogrammetry Image Stitching Quadratic complexity in high-resolution cross-view matchingLack of explicit matching constraints in cross-attentionHandling cross-view occlusions Cross-view matchingHigh-resolution image correspondence3D scene reconstruction

Related Fields

Computer VisionDeep Learning3D Computer VisionRoboticsMachine Learning Architectures

Keywords

cross-view matchingattentionhigh-resolutionrelative positionimage correspondence3D reconstructionMatchAttentionMatchDecoderocclusionBilinearSoftmaxSLAM

Academic Context

#Computer Vision#Image Matching#3D Reconstruction#Attention Mechanisms#High-Resolution Image Processing

Commercial Potential

Potential Products

High-accuracy 3D mapping softwareRobotics perception modulesTools for photogrammetry and digital modeling

Target Industries

RoboticsAutonomous VehiclesGeospatialConstructionVirtual RealityAugmented Reality

Use Case Examples

Accurate camera pose estimation for autonomous robots in complex environments.Generating detailed 3D models of buildings from aerial or ground-level imagery.Improving visual odometry in SLAM systems.

Competitive Edge

Offers a more efficient and effective solution for high-resolution cross-view matching by explicitly modeling relative positions within an attention framework, addressing limitations of generic cross-attention.

Market Opportunity

Growing demand for advanced computer vision solutions in robotics, AR/VR, and mapping.

Revenue Models

Licensing the technology to companies developing 3D reconstruction or robotics softwareoffering specialized perception modules.

Resource Requirements

Compute Needs

Optimized for efficiency, but high-resolution image processing can still be computationally intensive.

Data Requirements

Requires datasets with multiple views of scenes, suitable for training correspondence and matching models.

Deployment Constraints

Performance depends on image quality, resolution, and the degree of visual similarity/overlap between views. Handling extreme occlusions remains a challenge.

Scalability

The hierarchical decoder and attention sampling are designed to manage complexity, aiming for scalability with resolution.

Production Readiness

Maturity Level

Research

Time to Market

2-3 years

Patent Potential

High, for the MatchAttention mechanism and the MatchDecoder architecture.

View Full Paper Back to Papers