Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Computer Vision Researchers,Robotics Engineers,AR/VR Developers 2 weeks ago

PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis

computer-vision › 3d-vision
📄 Abstract

Abstract: Pairwise camera pose estimation from sparsely overlapping image pairs remains a critical and unsolved challenge in 3D vision. Most existing methods struggle with image pairs that have small or no overlap. Recent approaches attempt to address this by synthesizing intermediate frames using video interpolation and selecting key frames via a self-consistency score. However, the generated frames are often blurry due to small overlap inputs, and the selection strategies are slow and not explicitly aligned with pose estimation. To solve these cases, we propose Hybrid Video Generation (HVG) to synthesize clearer intermediate frames by coupling a video interpolation model with a pose-conditioned novel view synthesis model, where we also propose a Feature Matching Selector (FMS) based on feature correspondence to select intermediate frames appropriate for pose estimation from the synthesized results. Extensive experiments on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate that, compared to existing SOTA methods, PoseCrafter can obviously enhance the pose estimation performances, especially on examples with small or no overlap.
Authors (6)
Qing Mao
Tianxin Huang
Yu Zhu
Jinqiu Sun
Yanning Zhang
Gim Hee Lee
Submitted
October 22, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

PoseCrafter introduces Hybrid Video Generation (HVG) to synthesize clearer intermediate frames for pose estimation, overcoming blurriness from small overlap inputs. It also proposes a Feature Matching Selector (FMS) for selecting appropriate frames based on feature correspondence, improving speed and alignment with pose estimation goals.

Business Value

Enables more robust and accurate 3D reconstruction and localization in scenarios with limited visual overlap, crucial for applications like AR/VR, robotics, and autonomous systems.