Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Robotics Engineers,Autonomous Driving Researchers,Computer Vision Scientists 3 weeks ago

CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion

computer-vision › scene-understanding
📄 Abstract

Abstract: Semantic Scene Completion (SSC) aims to infer complete 3D geometry and semantics from monocular images, serving as a crucial capability for camera-based perception in autonomous driving. However, existing SSC methods relying on temporal stacking or depth projection often lack explicit motion reasoning and struggle with occlusions and noisy depth supervision. We propose CurriFlow, a novel semantic occupancy prediction framework that integrates optical flow-based temporal alignment with curriculum-guided depth fusion. CurriFlow employs a multi-level fusion strategy to align segmentation, visual, and depth features across frames using pre-trained optical flow, thereby improving temporal consistency and dynamic object understanding. To enhance geometric robustness, a curriculum learning mechanism progressively transitions from sparse yet accurate LiDAR depth to dense but noisy stereo depth during training, ensuring stable optimization and seamless adaptation to real-world deployment. Furthermore, semantic priors from the Segment Anything Model (SAM) provide category-agnostic supervision, strengthening voxel-level semantic learning and spatial consistency. Experiments on the SemanticKITTI benchmark demonstrate that CurriFlow achieves state-of-the-art performance with a mean IoU of 16.9, validating the effectiveness of our motion-guided and curriculum-aware design for camera-based 3D semantic scene completion.

Key Contributions

CurriFlow introduces a novel framework for 3D Semantic Scene Completion by integrating optical flow-based temporal alignment and curriculum-guided depth fusion. This approach explicitly reasons about motion and handles occlusions better than previous methods, while the curriculum learning strategy improves geometric robustness by progressively adapting to different depth data qualities.

Business Value

Enables more robust and accurate 3D perception for autonomous vehicles, improving safety and navigation capabilities by providing a complete understanding of the scene geometry and semantics, even in challenging conditions.