Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Pairwise camera pose estimation from sparsely overlapping image pairs remains
a critical and unsolved challenge in 3D vision. Most existing methods struggle
with image pairs that have small or no overlap. Recent approaches attempt to
address this by synthesizing intermediate frames using video interpolation and
selecting key frames via a self-consistency score. However, the generated
frames are often blurry due to small overlap inputs, and the selection
strategies are slow and not explicitly aligned with pose estimation. To solve
these cases, we propose Hybrid Video Generation (HVG) to synthesize clearer
intermediate frames by coupling a video interpolation model with a
pose-conditioned novel view synthesis model, where we also propose a Feature
Matching Selector (FMS) based on feature correspondence to select intermediate
frames appropriate for pose estimation from the synthesized results. Extensive
experiments on Cambridge Landmarks, ScanNet, DL3DV-10K, and NAVI demonstrate
that, compared to existing SOTA methods, PoseCrafter can obviously enhance the
pose estimation performances, especially on examples with small or no overlap.
Authors (6)
Qing Mao
Tianxin Huang
Yu Zhu
Jinqiu Sun
Yanning Zhang
Gim Hee Lee
Submitted
October 22, 2025
Key Contributions
PoseCrafter introduces Hybrid Video Generation (HVG) to synthesize clearer intermediate frames for pose estimation, overcoming blurriness from small overlap inputs. It also proposes a Feature Matching Selector (FMS) for selecting appropriate frames based on feature correspondence, improving speed and alignment with pose estimation goals.
Business Value
Enables more robust and accurate 3D reconstruction and localization in scenarios with limited visual overlap, crucial for applications like AR/VR, robotics, and autonomous systems.