Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Autonomous Vehicle Engineers,AI Researchers in Perception,Smart City Planners,Robotics Engineers 1 week ago

MIC-BEV: Multi-Infrastructure Camera Bird's-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection

computer-vision › 3d-vision
📄 Abstract

Abstract: Infrastructure-based perception plays a crucial role in intelligent transportation systems, offering global situational awareness and enabling cooperative autonomy. However, existing camera-based detection models often underperform in such scenarios due to challenges such as multi-view infrastructure setup, diverse camera configurations, degraded visual inputs, and various road layouts. We introduce MIC-BEV, a Transformer-based bird's-eye-view (BEV) perception framework for infrastructure-based multi-camera 3D object detection. MIC-BEV flexibly supports a variable number of cameras with heterogeneous intrinsic and extrinsic parameters and demonstrates strong robustness under sensor degradation. The proposed graph-enhanced fusion module in MIC-BEV integrates multi-view image features into the BEV space by exploiting geometric relationships between cameras and BEV cells alongside latent visual cues. To support training and evaluation, we introduce M2I, a synthetic dataset for infrastructure-based object detection, featuring diverse camera configurations, road layouts, and environmental conditions. Extensive experiments on both M2I and the real-world dataset RoScenes demonstrate that MIC-BEV achieves state-of-the-art performance in 3D object detection. It also remains robust under challenging conditions, including extreme weather and sensor degradation. These results highlight the potential of MIC-BEV for real-world deployment. The dataset and source code are available at: https://github.com/HandsomeYun/MIC-BEV.
Authors (8)
Yun Zhang
Zhaoliang Zheng
Johnson Liu
Zhiyu Huang
Zewei Zhou
Zonglin Meng
+2 more
Submitted
October 28, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

MIC-BEV is a novel Transformer-based BEV perception framework for infrastructure-based multi-camera 3D object detection. It addresses challenges like heterogeneous camera setups and sensor degradation by introducing a graph-enhanced fusion module that leverages geometric relationships and latent visual cues, demonstrating strong robustness and flexibility.

Business Value

Enhances the safety and efficiency of intelligent transportation systems and autonomous vehicles by providing more reliable and comprehensive 3D perception from roadside infrastructure.