Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: To reconstruct the 3D geometry from calibrated images, learning-based
multi-view stereo (MVS) methods typically perform multi-view depth estimation
and then fuse depth maps into a mesh or point cloud. To improve the
computational efficiency, many methods initialize a coarse depth map and then
gradually refine it in higher resolutions. Recently, diffusion models achieve
great success in generation tasks. Starting from a random noise, diffusion
models gradually recover the sample with an iterative denoising process. In
this paper, we propose a novel MVS framework, which introduces diffusion models
in MVS. Specifically, we formulate depth refinement as a conditional diffusion
process. Considering the discriminative characteristic of depth estimation, we
design a condition encoder to guide the diffusion process. To improve
efficiency, we propose a novel diffusion network combining lightweight 2D U-Net
and convolutional GRU. Moreover, we propose a novel confidence-based sampling
strategy to adaptively sample depth hypotheses based on the confidence
estimated by diffusion model. Based on our novel MVS framework, we propose two
novel MVS methods, DiffMVS and CasDiffMVS. DiffMVS achieves competitive
performance with state-of-the-art efficiency in run-time and GPU memory.
CasDiffMVS achieves state-of-the-art performance on DTU, Tanks & Temples and
ETH3D. Code is available at: https://github.com/cvg/diffmvs.