Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Accurate depth estimation is at the core of many applications in computer
graphics, vision, and robotics. Current state-of-the-art monocular depth
estimators, trained on extensive datasets, generalize well but lack 3D
consistency needed for many applications. In this paper, we combine the
strength of those generalizing monocular depth estimation techniques with
multi-view data by framing this as an analysis-by-synthesis optimization
problem to lift and refine such relative depth maps to accurate error-free
depth maps. After an initial global scale estimation through
structure-from-motion point clouds, we further refine the depth map through
optimization enforcing multi-view consistency via photometric and geometric
losses with differentiable rendering of the meshed depth map. In a two-stage
optimization, scaling is further refined first, and afterwards artifacts and
errors in the depth map are corrected via nearby-view photometric supervision.
Our evaluation shows that our method is able to generate detailed,
high-quality, view consistent, accurate depth maps, also in challenging indoor
scenarios, and outperforms state-of-the-art multi-view depth reconstruction
approaches on such datasets.
Project page and source code can be found at
https://lorafib.github.io/ref_depth/.