Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Scene reconstruction has emerged as a central challenge in computer vision,
with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting
achieving remarkable progress. While Gaussian Splatting demonstrates strong
performance on large-scale datasets, it often struggles to capture fine details
or maintain realism in regions with sparse coverage, largely due to the
inherent limitations of sparse 3D training data.
In this work, we propose GauSSmart, a hybrid method that effectively bridges
2D foundational models and 3D Gaussian Splatting reconstruction. Our approach
integrates established 2D computer vision techniques, including convex
filtering and semantic feature supervision from foundational models such as
DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D
segmentation priors and high-dimensional feature embeddings, our method guides
the densification and refinement of Gaussian splats, improving coverage in
underrepresented areas and preserving intricate structural details.
We validate our approach across three datasets, where GauSSmart consistently
outperforms existing Gaussian Splatting in the majority of evaluated scenes.
Our results demonstrate the significant potential of hybrid 2D-3D approaches,
highlighting how the thoughtful combination of 2D foundational models with 3D
reconstruction pipelines can overcome the limitations inherent in either
approach alone.
Authors (5)
Alexander Valverde
Brian Xu
Yuyin Zhou
Meng Xu
Hongyun Wang
Submitted
October 16, 2025
Key Contributions
GauSSmart proposes a hybrid method that combines 2D foundational models (like DINO) with 3D Gaussian Splatting for enhanced scene reconstruction. It leverages semantic features and geometric filtering to improve detail capture and realism, especially in regions with sparse training data, addressing limitations of existing Gaussian Splatting approaches.
Business Value
Enables the creation of more detailed and realistic 3D environments for applications like virtual tours, architectural visualization, and game development, potentially reducing manual modeling effort.