Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Robotics Engineers,AR/VR Developers,Computer Vision Researchers,3D Graphics Engineers 1 week ago

GS4: Generalizable Sparse Splatting Semantic SLAM

computer-vision › 3d-vision
📄 Abstract

Abstract: Traditional SLAM algorithms excel at camera tracking, but typically produce incomplete and low-resolution maps that are not tightly integrated with semantics prediction. Recent work integrates Gaussian Splatting (GS) into SLAM to enable dense, photorealistic 3D mapping, yet existing GS-based SLAM methods require per-scene optimization that is slow and consumes an excessive number of Gaussians. We present GS4, the first generalizable GS-based semantic SLAM system. Compared with prior approaches, GS4 runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance across color, depth, semantic mapping and camera tracking. From an RGB-D video stream, GS4 incrementally builds and updates a set of 3D Gaussians using a feed-forward network. First, the Gaussian Prediction Model estimates a sparse set of Gaussian parameters from input frame, which integrates both color and semantic prediction with the same backbone. Then, the Gaussian Refinement Network merges new Gaussians with the existing set while avoiding redundancy. Finally, we propose to optimize GS for only 1-5 iterations that corrects drift and floaters when significant pose changes are detected. Experiments on the real-world ScanNet and ScanNet++ benchmarks demonstrate state-of-the-art semantic SLAM performance, with strong generalization capability shown through zero-shot transfer to the NYUv2 and TUM RGB-D datasets.
Authors (4)
Mingqi Jiang
Chanho Kim
Chen Ziwen
Li Fuxin
Submitted
June 6, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

GS4 is the first generalizable Gaussian Splatting-based semantic SLAM system. It runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance in color, depth, semantic mapping, and camera tracking by incrementally building and updating Gaussians using feed-forward networks.

Business Value

Enables faster and more efficient creation of detailed, semantically rich 3D maps for applications like AR/VR content creation, robotic navigation, and digital twins.