Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Computer Vision Researchers,Robotics Engineers,Autonomous Systems Developers 2 weeks ago

GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation

computer-vision › diffusion-models
📄 Abstract

Abstract: We introduce a novel framework for metric depth estimation that enhances pretrained diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance. While existing DB-MDE methods excel at predicting relative depth, estimating absolute metric depth remains challenging due to scale ambiguities in single-image scenarios. To address this, we reframe depth estimation as an inverse problem, leveraging pretrained latent diffusion models (LDMs) conditioned on RGB images, combined with stereo-based geometric constraints, to learn scale and shift for accurate depth recovery. Our training-free solution seamlessly integrates into existing DB-MDE frameworks and generalizes across indoor, outdoor, and complex environments. Extensive experiments demonstrate that our approach matches or surpasses state-of-the-art methods, particularly in challenging scenarios involving translucent and specular surfaces, all without requiring retraining.
Authors (4)
Tuan Pham
Thanh-Tung Le
Xiaohui Xie
Stephan Mandt
Submitted
October 21, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces GeoDiff, a framework that enhances diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance for metric depth estimation. It reframes depth estimation as an inverse problem, leveraging LDMs and stereo constraints in a training-free manner.

Business Value

Enables more accurate 3D perception for robots and autonomous systems, improving navigation, object interaction, and scene understanding, especially in complex visual conditions. Reduces reliance on stereo cameras in some applications.