Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Semantic Scene Completion (SSC) aims to infer complete 3D geometry and
semantics from monocular images, serving as a crucial capability for
camera-based perception in autonomous driving. However, existing SSC methods
relying on temporal stacking or depth projection often lack explicit motion
reasoning and struggle with occlusions and noisy depth supervision. We propose
CurriFlow, a novel semantic occupancy prediction framework that integrates
optical flow-based temporal alignment with curriculum-guided depth fusion.
CurriFlow employs a multi-level fusion strategy to align segmentation, visual,
and depth features across frames using pre-trained optical flow, thereby
improving temporal consistency and dynamic object understanding. To enhance
geometric robustness, a curriculum learning mechanism progressively transitions
from sparse yet accurate LiDAR depth to dense but noisy stereo depth during
training, ensuring stable optimization and seamless adaptation to real-world
deployment. Furthermore, semantic priors from the Segment Anything Model (SAM)
provide category-agnostic supervision, strengthening voxel-level semantic
learning and spatial consistency. Experiments on the SemanticKITTI benchmark
demonstrate that CurriFlow achieves state-of-the-art performance with a mean
IoU of 16.9, validating the effectiveness of our motion-guided and
curriculum-aware design for camera-based 3D semantic scene completion.
Key Contributions
CurriFlow introduces a novel framework for 3D Semantic Scene Completion by integrating optical flow-based temporal alignment and curriculum-guided depth fusion. This approach explicitly reasons about motion and handles occlusions better than previous methods, while the curriculum learning strategy improves geometric robustness by progressively adapting to different depth data qualities.
Business Value
Enables more robust and accurate 3D perception for autonomous vehicles, improving safety and navigation capabilities by providing a complete understanding of the scene geometry and semantics, even in challenging conditions.