Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Infrastructure-based perception plays a crucial role in intelligent
transportation systems, offering global situational awareness and enabling
cooperative autonomy. However, existing camera-based detection models often
underperform in such scenarios due to challenges such as multi-view
infrastructure setup, diverse camera configurations, degraded visual inputs,
and various road layouts. We introduce MIC-BEV, a Transformer-based
bird's-eye-view (BEV) perception framework for infrastructure-based
multi-camera 3D object detection. MIC-BEV flexibly supports a variable number
of cameras with heterogeneous intrinsic and extrinsic parameters and
demonstrates strong robustness under sensor degradation. The proposed
graph-enhanced fusion module in MIC-BEV integrates multi-view image features
into the BEV space by exploiting geometric relationships between cameras and
BEV cells alongside latent visual cues. To support training and evaluation, we
introduce M2I, a synthetic dataset for infrastructure-based object detection,
featuring diverse camera configurations, road layouts, and environmental
conditions. Extensive experiments on both M2I and the real-world dataset
RoScenes demonstrate that MIC-BEV achieves state-of-the-art performance in 3D
object detection. It also remains robust under challenging conditions,
including extreme weather and sensor degradation. These results highlight the
potential of MIC-BEV for real-world deployment. The dataset and source code are
available at: https://github.com/HandsomeYun/MIC-BEV.
Authors (8)
Yun Zhang
Zhaoliang Zheng
Johnson Liu
Zhiyu Huang
Zewei Zhou
Zonglin Meng
+2 more
Submitted
October 28, 2025
Key Contributions
MIC-BEV is a novel Transformer-based BEV perception framework for infrastructure-based multi-camera 3D object detection. It addresses challenges like heterogeneous camera setups and sensor degradation by introducing a graph-enhanced fusion module that leverages geometric relationships and latent visual cues, demonstrating strong robustness and flexibility.
Business Value
Enhances the safety and efficiency of intelligent transportation systems and autonomous vehicles by providing more reliable and comprehensive 3D perception from roadside infrastructure.