Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: In this paper, we present DM-Calib, a diffusion-based approach for estimating
pinhole camera intrinsic parameters from a single input image. Monocular camera
calibration is essential for many 3D vision tasks. However, most existing
methods depend on handcrafted assumptions or are constrained by limited
training data, resulting in poor generalization across diverse real-world
images. Recent advancements in stable diffusion models, trained on massive
data, have shown the ability to generate high-quality images with varied
characteristics. Emerging evidence indicates that these models implicitly
capture the relationship between camera focal length and image content.
Building on this insight, we explore how to leverage the powerful priors of
diffusion models for monocular pinhole camera calibration. Specifically, we
introduce a new image-based representation, termed Camera Image, which
losslessly encodes the numerical camera intrinsics and integrates seamlessly
with the diffusion framework. Using this representation, we reformulate the
problem of estimating camera intrinsics as the generation of a dense Camera
Image conditioned on an input image. By fine-tuning a stable diffusion model to
generate a Camera Image from a single RGB input, we can extract camera
intrinsics via a RANSAC operation. We further demonstrate that our monocular
calibration method enhances performance across various 3D tasks, including
zero-shot metric depth estimation, 3D metrology, pose estimation and
sparse-view reconstruction. Extensive experiments on multiple public datasets
show that our approach significantly outperforms baselines and provides broad
benefits to 3D vision tasks.