Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Traditional SLAM algorithms excel at camera tracking, but typically produce
incomplete and low-resolution maps that are not tightly integrated with
semantics prediction. Recent work integrates Gaussian Splatting (GS) into SLAM
to enable dense, photorealistic 3D mapping, yet existing GS-based SLAM methods
require per-scene optimization that is slow and consumes an excessive number of
Gaussians. We present GS4, the first generalizable GS-based semantic SLAM
system. Compared with prior approaches, GS4 runs 10x faster, uses 10x fewer
Gaussians, and achieves state-of-the-art performance across color, depth,
semantic mapping and camera tracking. From an RGB-D video stream, GS4
incrementally builds and updates a set of 3D Gaussians using a feed-forward
network. First, the Gaussian Prediction Model estimates a sparse set of
Gaussian parameters from input frame, which integrates both color and semantic
prediction with the same backbone. Then, the Gaussian Refinement Network merges
new Gaussians with the existing set while avoiding redundancy. Finally, we
propose to optimize GS for only 1-5 iterations that corrects drift and floaters
when significant pose changes are detected. Experiments on the real-world
ScanNet and ScanNet++ benchmarks demonstrate state-of-the-art semantic SLAM
performance, with strong generalization capability shown through zero-shot
transfer to the NYUv2 and TUM RGB-D datasets.
Authors (4)
Mingqi Jiang
Chanho Kim
Chen Ziwen
Li Fuxin
Key Contributions
GS4 is the first generalizable Gaussian Splatting-based semantic SLAM system. It runs 10x faster, uses 10x fewer Gaussians, and achieves state-of-the-art performance in color, depth, semantic mapping, and camera tracking by incrementally building and updating Gaussians using feed-forward networks.
Business Value
Enables faster and more efficient creation of detailed, semantically rich 3D maps for applications like AR/VR content creation, robotic navigation, and digital twins.