arxiv_cv 95% Match Research Paper Researchers in computer vision and graphics,Developers of VR/AR applications,3D artists and modelers 3 weeks ago

GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering

computer-vision › 3d-vision

📄 Abstract

Abstract: Scene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data. In this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details. We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.

Authors (5)

Alexander Valverde

Brian Xu

Yuyin Zhou

Meng Xu

Hongyun Wang

Submitted

October 16, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

GauSSmart proposes a hybrid method that combines 2D foundational models (like DINO) with 3D Gaussian Splatting for enhanced scene reconstruction. It leverages semantic features and geometric filtering to improve detail capture and realism, especially in regions with sparse training data, addressing limitations of existing Gaussian Splatting approaches.

Business Value

Enables the creation of more detailed and realistic 3D environments for applications like virtual tours, architectural visualization, and game development, potentially reducing manual modeling effort.

Paper Metadata

Innovation Type

Hybrid Approach

Deployment Feasibility

Moderate. Requires significant computational resources for training and rendering, but the output can be used in standard 3D pipelines.

Limitations Addressed

Lack of fine details in Gaussian Splatting reconstructions,Poor realism in regions with sparse coverage,Limitations of sparse 3D training data

Performance Gains

Improved detail and realism in 3D reconstructions, particularly in sparse regions, compared to standard Gaussian Splatting.

Technical Tags

3D reconstructionGaussian SplattingNeural Radiance Fields (NeRF)2D foundation modelsgeometric filteringsemantic featuresDINOsparse datascene representation

Research Topics

3D Computer VisionScene ReconstructionGenerative ModelsDeep LearningGeometric Processing

Methods & Architectures

Gaussian SplattingNeural Radiance Fields (NeRF)Convex filteringSemantic feature supervisionDINOHybrid method Gaussian SplattingNeRFFoundation Models (e.g., DINO)

Applications & Tasks

Virtual Reality (VR) Augmented Reality (AR) Robotics 3D Content Creation Gaming Improving 3D Reconstruction DetailHandling Sparse Training DataEnhancing Realism in Reconstructed Scenes 3D Scene ReconstructionGenerating Realistic 3D Models

Datasets & Benchmarks

Datasets

ScanNet

PSNRSSIMLPIPS

Related Fields

Computer GraphicsRoboticsVirtual RealityAugmented RealityMachine Learning

Keywords

3D reconstructionGaussian SplattingNeRFfoundation modelsDINOgeometric filteringscene understandingsparse datacomputer vision3D graphicsrealismdetail enhancement

Academic Context

#3D Computer Vision#Scene Reconstruction#Generative Models#Deep Learning#Geometric Processing

Technology Stack

Frameworks & Libraries

PyTorch

Programming Languages

Python

ML Infrastructure

CUDA

Commercial Potential

Potential Products

3D scanning softwareVirtual environment creation toolsAR/VR content platforms

Target Industries

GamingEntertainmentArchitectureReal EstateRobotics

Use Case Examples

Creating photorealistic virtual walkthroughs of buildingsGenerating detailed 3D assets for video gamesRobotic scene mapping and understanding

Competitive Edge

Improves upon existing Gaussian Splatting methods by integrating 2D semantic priors, leading to higher fidelity reconstructions, especially in challenging scenarios.

Market Opportunity

Significant growth in VR/AR and 3D content creation markets.

Revenue Models

Software licensingcloud-based rendering services.

Resource Requirements

Compute Needs

High, especially for training and rendering large scenes.

Data Requirements

Large-scale 3D scene datasets with RGB-D information.

Deployment Constraints

Computational cost,Memory requirements

Scalability

Scalability to very large scenes can be challenging due to memory and computation demands.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years

Patent Potential

Moderate, for the hybrid approach and specific filtering techniques.

View Full Paper Back to Papers