Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Online HD map construction is a fundamental task in autonomous driving
systems, aiming to acquire semantic information of map elements around the ego
vehicle based on real-time sensor inputs. Recently, several approaches have
achieved promising results by incorporating offline priors such as SD maps and
HD maps or by fusing multi-modal data. However, these methods depend on stale
offline maps and multi-modal sensor suites, resulting in avoidable
computational overhead at inference. To address these limitations, we employ a
knowledge distillation strategy to transfer knowledge from multimodal models
with prior knowledge to an efficient, low-cost, and vision-centric student
model. Specifically, we propose MapKD, a novel multi-level cross-modal
knowledge distillation framework with an innovative Teacher-Coach-Student (TCS)
paradigm. This framework consists of: (1) a camera-LiDAR fusion model with
SD/HD map priors serving as the teacher; (2) a vision-centric coach model with
prior knowledge and simulated LiDAR to bridge the cross-modal knowledge
transfer gap; and (3) a lightweight vision-based student model. Additionally,
we introduce two targeted knowledge distillation strategies: Token-Guided 2D
Patch Distillation (TGPD) for bird's eye view feature alignment and Masked
Semantic Response Distillation (MSRD) for semantic learning guidance. Extensive
experiments on the challenging nuScenes dataset demonstrate that MapKD improves
the student model by +6.68 mIoU and +10.94 mAP while simultaneously
accelerating inference speed. The code is available
at:https://github.com/2004yan/MapKD2026.