Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper Robotics Engineers,Drone Developers,AI Researchers in Robotics,Autonomous Systems Specialists 2 weeks ago

MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention

computer-vision › video-understanding
📄 Abstract

Abstract: Recognizing the motion of Micro Aerial Vehicles (MAVs) is crucial for enabling cooperative perception and control in autonomous aerial swarms. Yet, vision-based recognition models relying only on RGB data often fail to capture the complex spatial temporal characteristics of MAV motion, which limits their ability to distinguish different actions. To overcome this problem, this paper presents MAVR-Net, a multi-view learning-based MAV action recognition framework. Unlike traditional single-view methods, the proposed approach combines three complementary types of data, including raw RGB frames, optical flow, and segmentation masks, to improve the robustness and accuracy of MAV motion recognition. Specifically, ResNet-based encoders are used to extract discriminative features from each view, and a multi-scale feature pyramid is adopted to preserve the spatiotemporal details of MAV motion patterns. To enhance the interaction between different views, a cross-view attention module is introduced to model the dependencies among various modalities and feature scales. In addition, a multi-view alignment loss is designed to ensure semantic consistency and strengthen cross-view feature representations. Experimental results on benchmark MAV action datasets show that our method clearly outperforms existing approaches, achieving 97.8\%, 96.5\%, and 92.8\% accuracy on the Short MAV, Medium MAV, and Long MAV datasets, respectively.
Authors (2)
Nengbo Zhang
Hann Woei Ho
Submitted
October 17, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

MAVR-Net is a multi-view learning framework for MAV action recognition that combines RGB frames, optical flow, and segmentation masks. It uses cross-view attention to enhance interaction between views, improving robustness and accuracy for complex MAV motions.

Business Value

Enhances the capabilities of autonomous aerial swarms for tasks like coordinated surveillance, formation flying, and complex maneuvers, leading to more sophisticated drone applications.