Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: While supervised stereo matching and monocular depth estimation have advanced
significantly with learning-based algorithms, self-supervised methods using
stereo images as supervision signals have received relatively less focus and
require further investigation. A primary challenge arises from ambiguity
introduced during photometric reconstruction, particularly due to missing
corresponding pixels in ill-posed regions of the target view, such as
occlusions and out-of-frame areas. To address this and establish explicit
photometric correspondences, we propose DMS, a model-agnostic approach that
utilizes geometric priors from diffusion models to synthesize novel views along
the epipolar direction, guided by directional prompts. Specifically, we
finetune a Stable Diffusion model to simulate perspectives at key positions:
left-left view shifted from the left camera, right-right view shifted from the
right camera, along with an additional novel view between the left and right
cameras. These synthesized views supplement occluded pixels, enabling explicit
photometric reconstruction. Our proposed DMS is a cost-free, ''plug-and-play''
method that seamlessly enhances self-supervised stereo matching and monocular
depth estimation, and relies solely on unlabeled stereo image pairs for both
training and synthesizing. Extensive experiments demonstrate the effectiveness
of our approach, with up to 35% outlier reduction and state-of-the-art
performance across multiple benchmark datasets.