Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: RGB-based novel object pose estimation is critical for rapid deployment in
robotic applications, yet zero-shot generalization remains a key challenge. In
this paper, we introduce PicoPose, a novel framework designed to tackle this
task using a three-stage pixel-to-pixel correspondence learning process.
Firstly, PicoPose matches features from the RGB observation with those from
rendered object templates, identifying the best-matched template and
establishing coarse correspondences. Secondly, PicoPose smooths the
correspondences by globally regressing a 2D affine transformation, including
in-plane rotation, scale, and 2D translation, from the coarse correspondence
map. Thirdly, PicoPose applies the affine transformation to the feature map of
the best-matched template and learns correspondence offsets within local
regions to achieve fine-grained correspondences. By progressively refining the
correspondences, PicoPose significantly improves the accuracy of object poses
computed via PnP/RANSAC. PicoPose achieves state-of-the-art performance on the
seven core datasets of the BOP benchmark, demonstrating exceptional
generalization to novel objects. Code and trained models are available at
https://github.com/foollh/PicoPose.