Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This paper introduces a novel self-supervised learning framework for
enhancing 3D perception in autonomous driving scenes. Specifically, our
approach, namely NCLR, focuses on 2D-3D neural calibration, a novel pretext
task that estimates the rigid pose aligning camera and LiDAR coordinate
systems. First, we propose the learnable transformation alignment to bridge the
domain gap between image and point cloud data, converting features into a
unified representation space for effective comparison and matching. Second, we
identify the overlapping area between the image and point cloud with the fused
features. Third, we establish dense 2D-3D correspondences to estimate the rigid
pose. The framework not only learns fine-grained matching from points to pixels
but also achieves alignment of the image and point cloud at a holistic level,
understanding the LiDAR-to-camera extrinsic parameters. We demonstrate the
efficacy of NCLR by applying the pre-trained backbone to downstream tasks, such
as LiDAR-based 3D semantic segmentation, object detection, and panoptic
segmentation. Comprehensive experiments on various datasets illustrate the
superiority of NCLR over existing self-supervised methods. The results confirm
that joint learning from different modalities significantly enhances the
network's understanding abilities and effectiveness of learned representation.
The code is publicly available at https://github.com/Eaphan/NCLR.