Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: RGB cameras excel at capturing rich texture details with high spatial
resolution, whereas event cameras offer exceptional temporal resolution and a
high dynamic range (HDR). Leveraging their complementary strengths can
substantially enhance object tracking under challenging conditions, such as
high-speed motion, HDR environments, and dynamic background interference.
However, a significant spatio-temporal asymmetry exists between these two
modalities due to their fundamentally different imaging mechanisms, hindering
effective multi-modal integration. To address this issue, we propose
{Hierarchical Asymmetric Distillation} (HAD), a multi-modal knowledge
distillation framework that explicitly models and mitigates spatio-temporal
asymmetries. Specifically, HAD proposes a hierarchical alignment strategy that
minimizes information loss while maintaining the student network's
computational efficiency and parameter compactness. Extensive experiments
demonstrate that HAD consistently outperforms state-of-the-art methods, and
comprehensive ablation studies further validate the effectiveness and necessity
of each designed component. The code will be released soon.
Authors (6)
Yao Deng
Xian Zhong
Wenxuan Liu
Zhaofei Yu
Jingling Yuan
Tiejun Huang
Submitted
October 22, 2025
Key Contributions
HAD is a novel multi-modal knowledge distillation framework designed to mitigate spatio-temporal asymmetries between RGB and event cameras for object tracking. It employs a hierarchical alignment strategy to minimize information loss while maintaining efficiency, enabling better fusion of complementary sensor strengths.
Business Value
Improves the robustness and accuracy of object tracking systems by effectively combining data from different sensor types, leading to safer autonomous systems and more reliable surveillance.