Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: 3D Gaussian Splatting (3DGS) has demonstrated remarkable real-time
performance in novel view synthesis, yet its effectiveness relies heavily on
dense multi-view inputs with precisely known camera poses, which are rarely
available in real-world scenarios. When input views become extremely sparse,
the Structure-from-Motion (SfM) method that 3DGS depends on for initialization
fails to accurately reconstruct the 3D geometric structures of scenes,
resulting in degraded rendering quality. In this paper, we propose a novel
SfM-free 3DGS-based method that jointly estimates camera poses and reconstructs
3D scenes from extremely sparse-view inputs. Specifically, instead of SfM, we
propose a dense stereo module to progressively estimates camera pose
information and reconstructs a global dense point cloud for initialization. To
address the inherent problem of information scarcity in extremely sparse-view
settings, we propose a coherent view interpolation module that interpolates
camera poses based on training view pairs and generates viewpoint-consistent
content as additional supervision signals for training. Furthermore, we
introduce multi-scale Laplacian consistent regularization and adaptive
spatial-aware multi-scale geometry regularization to enhance the quality of
geometrical structures and rendered content. Experiments show that our method
significantly outperforms other state-of-the-art 3DGS-based approaches,
achieving a remarkable 2.75dB improvement in PSNR under extremely sparse-view
conditions (using only 2 training views). The images synthesized by our method
exhibit minimal distortion while preserving rich high-frequency details,
resulting in superior visual quality compared to existing techniques.