Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper Computer Vision Researchers,Robotics Engineers,AR/VR Developers 1 week ago

Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

computer-vision › scene-understanding
📄 Abstract

Abstract: Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of generative networks, generative-based methods often exhibit enhanced robustness. In light of this, we employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation. Additionally, we propose a hierarchical feature-guided denoising module. This model significantly enriches the model's capacity for learning and interpreting depth distribution by fully leveraging image features to guide the denoising process. Furthermore, we explore the implicit depth within reprojection and design an implicit depth consistency loss. This loss function serves to enhance the performance of the model and ensure the scale consistency of depth within a video sequence. We conduct experiments on the KITTI, Make3D, and our self-collected SIMIT datasets. The results indicate that our approach stands out among generative-based models, while also showcasing remarkable robustness.
Authors (8)
Runze Liu
Dongchen Zhu
Guanghui Zhang
Yue Xu
Wenjun Shi
Xiaolin Zhang
+2 more
Submitted
June 14, 2024
arXiv Category
cs.CV
arXiv PDF

Key Contributions

This paper proposes a novel unsupervised monocular depth estimation method using a diffusion model. It introduces a hierarchical feature-guided denoising module to enhance feature interpretation and an implicit depth consistency loss leveraging reprojection, aiming to improve robustness in real-world scenarios with blurry or noisy images.

Business Value

Enables more accurate 3D scene understanding from single images, which is crucial for applications like autonomous navigation, augmented reality, and robotic manipulation where precise depth information is vital.