Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Understanding 3D scenes is pivotal for autonomous driving, robotics, and
augmented reality. Recent semantic Gaussian Splatting approaches leverage
large-scale 2D vision models to project 2D semantic features onto 3D scenes.
However, they suffer from two major limitations: (1) insufficient contextual
cues for individual masks during preprocessing and (2) inconsistencies and
missing details when fusing multi-view features from these 2D models. In this
paper, we introduce \textbf{OpenInsGaussian}, an \textbf{Open}-vocabulary
\textbf{Ins}tance \textbf{Gaussian} segmentation framework with Context-aware
Cross-view Fusion. Our method consists of two modules: Context-Aware Feature
Extraction, which augments each mask with rich semantic context, and
Attention-Driven Feature Aggregation, which selectively fuses multi-view
features to mitigate alignment errors and incompleteness. Through extensive
experiments on benchmark datasets, OpenInsGaussian achieves state-of-the-art
results in open-vocabulary 3D Gaussian segmentation, outperforming existing
baselines by a large margin. These findings underscore the robustness and
generality of our proposed approach, marking a significant step forward in 3D
scene understanding and its practical deployment across diverse real-world
scenarios.
Authors (6)
Tianyu Huang
Runnan Chen
Dongting Hu
Fengming Huang
Mingming Gong
Tongliang Liu
Submitted
October 21, 2025
Key Contributions
Introduces OpenInsGaussian, an open-vocabulary instance Gaussian segmentation framework with context-aware cross-view fusion. It addresses limitations in contextual cues and multi-view fusion by employing attention mechanisms for selective feature aggregation.
Business Value
Enables more comprehensive and accurate 3D scene understanding for applications like autonomous navigation and robotic manipulation, leading to improved safety and task performance. Facilitates richer AR experiences.