arxiv_cv 95% Match Research Paper Computer Vision Researchers,Robotics Engineers,AR/VR Developers 2 weeks ago

PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis

computer-vision › 3d-vision

📄 Abstract

Abstract: Pairwise camera pose estimation from sparsely overlapping image pairs remains a critical and unsolved challenge in 3D vision. Most existing methods struggle with image pairs that have small or no overlap. Recent approaches attempt to address this by synthesizing intermediate frames using video interpolation and selecting key frames via a self-consistency score. However, the generated frames are often blurry due to small overlap inputs, and the selection strategies are slow and not explicitly aligned with pose estimation. To solve these cases, we propose Hybrid Video Generation (HVG) to synthesize clearer intermediate frames by coupling a video interpolation model with a pose-conditioned novel view synthesis model, where we also propose a Feature Matching Selector (FMS) based on feature correspondence to select intermediate frames appropriate for pose estimation from the synthesized results. Extensive experiments on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate that, compared to existing SOTA methods, PoseCrafter can obviously enhance the pose estimation performances, especially on examples with small or no overlap.

Authors (6)

Qing Mao

Tianxin Huang

Yu Zhu

Jinqiu Sun

Yanning Zhang

Gim Hee Lee

Submitted

October 22, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

PoseCrafter introduces Hybrid Video Generation (HVG) to synthesize clearer intermediate frames for pose estimation, overcoming blurriness from small overlap inputs. It also proposes a Feature Matching Selector (FMS) for selecting appropriate frames based on feature correspondence, improving speed and alignment with pose estimation goals.

Business Value

Enables more robust and accurate 3D reconstruction and localization in scenarios with limited visual overlap, crucial for applications like AR/VR, robotics, and autonomous systems.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Moderate. Requires significant computational resources for synthesis and feature matching, but the proposed methods aim for efficiency.

Limitations Addressed

Addresses the challenge of pairwise camera pose estimation with sparsely overlapping image pairs, where existing methods struggle. Solves issues of blurry synthesized frames and slow, misaligned frame selection strategies.

Technical Tags

pose estimationcamera pose3d visionvideo synthesisnovel view synthesisfeature matchinggeometric consistencysparse overlap

Research Topics

3D Computer VisionRoboticsSimultaneous Localization and Mapping (SLAM)Augmented RealityVirtual Reality

Methods & Architectures

Hybrid Video Generation (HVG)Video InterpolationPose-Conditioned Novel View SynthesisFeature Matching Selector (FMS)Feature Correspondence Pose-conditioned Novel View Synthesis Model

Applications & Tasks

3D Reconstruction Robotics Autonomous Driving Augmented Reality Virtual Reality Camera Pose Estimation3D Scene ReconstructionHandling Sparse Overlap Pairwise Camera Pose Estimation3D Scene UnderstandingVisual Odometry

Datasets & Benchmarks

Datasets

Cambridge Landmarks, ScanNet, DL3DV-10K, NAVI

Related Fields

Computer VisionRobotics3D GraphicsMachine Learning

Keywords

pose estimationcamera pose3d visionvideo synthesisnovel view synthesisfeature matchinggeometric consistencysparse overlaphybrid video generationfeature correspondence3d reconstructionvisual odometryslam

Academic Context

#3D Computer Vision#Robotics#Simultaneous Localization and Mapping (SLAM)#Augmented Reality#Virtual Reality

Commercial Potential

Potential Products

3D mapping softwareAR/VR content creation toolsRobotic navigation systems

Target Industries

TechnologyGamingAutomotiveRobotics

Use Case Examples

Reconstructing 3D environments from limited camera viewsImproving localization accuracy for robots in unknown spacesEnhancing AR experiences with better scene understanding

Competitive Edge

Aims to outperform existing methods that struggle with low-overlap image pairs by providing clearer synthesized frames and more relevant frame selection.

Market Opportunity

Large, driven by AR/VR, robotics, and autonomous systems markets.

Revenue Models

Licensing of technologyintegration into existing platforms.

Resource Requirements

Compute Needs

High, due to video synthesis and feature matching.

Data Requirements

Large-scale datasets with diverse camera poses and scene structures.

Deployment Constraints

Real-time performance might be challenging for complex scenes or low-power devices.

Scalability

Scalability depends on the efficiency of the synthesis and matching modules.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Patent Potential

Moderate, for novel synthesis and selection techniques.

View Full Paper Back to Papers