arxiv_cv 95% Match Research Robotics Engineers,AR/VR Developers,Computer Vision Researchers,3D Graphics Engineers 1 week ago

GS4: Generalizable Sparse Splatting Semantic SLAM

computer-vision › 3d-vision

📄 Abstract

Abstract: Traditional SLAM algorithms excel at camera tracking, but typically produce incomplete and low-resolution maps that are not tightly integrated with semantics prediction. Recent work integrates Gaussian Splatting (GS) into SLAM to enable dense, photorealistic 3D mapping, yet existing GS-based SLAM methods require per-scene optimization that is slow and consumes an excessive number of Gaussians. We present GS4, the first generalizable GS-based semantic SLAM system. Compared with prior approaches, GS4 runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance across color, depth, semantic mapping and camera tracking. From an RGB-D video stream, GS4 incrementally builds and updates a set of 3D Gaussians using a feed-forward network. First, the Gaussian Prediction Model estimates a sparse set of Gaussian parameters from input frame, which integrates both color and semantic prediction with the same backbone. Then, the Gaussian Refinement Network merges new Gaussians with the existing set while avoiding redundancy. Finally, we propose to optimize GS for only 1-5 iterations that corrects drift and floaters when significant pose changes are detected. Experiments on the real-world ScanNet and ScanNet++ benchmarks demonstrate state-of-the-art semantic SLAM performance, with strong generalization capability shown through zero-shot transfer to the NYUv2 and TUM RGB-D datasets.

Authors (4)

Mingqi Jiang

Chanho Kim

Chen Ziwen

Li Fuxin

Submitted

June 6, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

GS4 is the first generalizable Gaussian Splatting-based semantic SLAM system. It runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance in color, depth, semantic mapping, and camera tracking by incrementally building and updating Gaussians using feed-forward networks.

Business Value

Enables faster and more efficient creation of detailed, semantically rich 3D maps for applications like AR/VR content creation, robotic navigation, and digital twins.

Paper Metadata

Innovation Type

Algorithmic/Framework

Deployment Feasibility

High, especially for real-time applications due to speed improvements. Requires RGB-D input.

Limitations Addressed

Slow per-scene optimization, excessive memory usage (number of Gaussians), and lack of tight semantic integration in existing Gaussian Splatting-based SLAM methods, as well as incomplete maps from traditional SLAM.

Performance Gains

10x faster, 10x fewer Gaussians, state-of-the-art performance across multiple metrics.

Technical Tags

SLAMGaussian Splatting (GS)Semantic MappingGeneralizable SLAMFeed-forward NetworkRGB-D Stream3D ReconstructionCamera TrackingReal-time PerformanceSparse GaussiansNeural Rendering

Research Topics

Simultaneous Localization and Mapping (SLAM)3D Scene ReconstructionSemantic Scene UnderstandingNeural RenderingReal-time Computer Vision

Methods & Architectures

Gaussian SplattingFeed-forward Neural NetworksIncremental UpdatesSemantic PredictionCamera Tracking Feed-forward NetworkGaussian Splatting

Applications & Tasks

Robotics Augmented Reality (AR) Virtual Reality (VR) 3D Mapping Autonomous Driving Incomplete/low-resolution maps from traditional SLAMSlow per-scene optimization in GS-based SLAMExcessive number of Gaussians in GS-based SLAMLack of tight integration with semantics Generalizable Semantic SLAMReal-time 3D mapping and camera trackingSemantic scene reconstruction

Datasets & Benchmarks

Benchmarks

Color mapping • Depth mapping • Semantic mapping • Camera tracking performance

Related Fields

Computer VisionRobotics3D GraphicsAugmented RealityVirtual RealityMachine Learning

Keywords

SLAMGaussian Splatting3D ReconstructionSemantic MappingReal-timeGeneralizableNeural RenderingCamera TrackingRoboticsARVRRGB-D

Academic Context

#Simultaneous Localization and Mapping (SLAM)#3D Scene Reconstruction#Semantic Scene Understanding#Neural Rendering#Real-time Computer Vision

Commercial Potential

Potential Products

Real-time 3D mapping software for AR/VRRobotic navigation systemsTools for creating digital twins of environments

Target Industries

GamingEntertainmentRoboticsArchitectureConstructionAutonomous Vehicles

Use Case Examples

Creating interactive 3D models of real-world spaces for virtual toursEnabling robots to navigate complex environments with semantic understandingGenerating photorealistic 3D assets for AR applications

Competitive Edge

Significantly improves upon existing GS-based SLAM by offering generalizability, speed, and reduced memory footprint while integrating semantics.

Market Opportunity

Large and growing markets for AR/VR, robotics, and 3D content creation.

Revenue Models

Licensing of the GS4 technologydevelopment of specialized mapping software.

Resource Requirements

Compute Needs

Moderate to High, for real-time processing of RGB-D streams and neural network inference.

Data Requirements

RGB-D video streams from various environments.

Deployment Constraints

Requires a depth sensor (e.g., RGB-D camera). Performance may vary with scene complexity and lighting conditions.

Scalability

The use of feed-forward networks and efficient Gaussian management suggests good scalability.

Production Readiness

Maturity Level

Research/Prototype

Time to Market

1-2 years for integration into AR/VR platforms or robotics systems.

Patent Potential

Moderate, for the generalizable GS-SLAM approach.

View Full Paper Back to Papers