arxiv_cv 90% Match Research Paper Robotics Engineers,Drone Developers,AI Researchers in Robotics,Autonomous Systems Specialists 2 weeks ago

MAVR-Net: Robust Multi-View Learning for MAV Action Recognition with Cross-View Attention

computer-vision › video-understanding

📄 Abstract

Abstract: Recognizing the motion of Micro Aerial Vehicles (MAVs) is crucial for enabling cooperative perception and control in autonomous aerial swarms. Yet, vision-based recognition models relying only on RGB data often fail to capture the complex spatial temporal characteristics of MAV motion, which limits their ability to distinguish different actions. To overcome this problem, this paper presents MAVR-Net, a multi-view learning-based MAV action recognition framework. Unlike traditional single-view methods, the proposed approach combines three complementary types of data, including raw RGB frames, optical flow, and segmentation masks, to improve the robustness and accuracy of MAV motion recognition. Specifically, ResNet-based encoders are used to extract discriminative features from each view, and a multi-scale feature pyramid is adopted to preserve the spatiotemporal details of MAV motion patterns. To enhance the interaction between different views, a cross-view attention module is introduced to model the dependencies among various modalities and feature scales. In addition, a multi-view alignment loss is designed to ensure semantic consistency and strengthen cross-view feature representations. Experimental results on benchmark MAV action datasets show that our method clearly outperforms existing approaches, achieving 97.8\%, 96.5\%, and 92.8\% accuracy on the Short MAV, Medium MAV, and Long MAV datasets, respectively.

Authors (2)

Nengbo Zhang

Hann Woei Ho

Submitted

October 17, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

MAVR-Net is a multi-view learning framework for MAV action recognition that combines RGB frames, optical flow, and segmentation masks. It uses cross-view attention to enhance interaction between views, improving robustness and accuracy for complex MAV motions.

Business Value

Enhances the capabilities of autonomous aerial swarms for tasks like coordinated surveillance, formation flying, and complex maneuvers, leading to more sophisticated drone applications.

Paper Metadata

Innovation Type

Architectural Innovation

Deployment Feasibility

Feasible for integration into MAV systems. Requires onboard processing capabilities or communication links for data transmission.

Limitations Addressed

Failure of single-view RGB models to capture complex spatial-temporal characteristics of MAV motion, limiting their ability to distinguish actions.

Performance Gains

Improved robustness and accuracy in recognizing MAV actions by leveraging complementary multi-view data.

Technical Tags

MAV Action RecognitionMulti-View LearningCross-View AttentionRGB FramesOptical FlowSegmentation MasksFeature ExtractionFeature PyramidSpatiotemporal CharacteristicsAutonomous Swarms

Research Topics

Computer VisionRoboticsAutonomous SystemsVideo AnalysisMulti-modal Learning

Methods & Architectures

Multi-view learningCross-view attention moduleResNet-based feature encodersMulti-scale feature pyramid ResNet

Applications & Tasks

Robotics Autonomous Swarms Aerial Surveillance Drone Technology MAV Action RecognitionRobustness to Motion ComplexityMulti-modal Fusion MAV Action RecognitionMotion Recognition

Related Fields

RoboticsAerospace EngineeringMachine LearningSensor Fusion

Keywords

MAV Action RecognitionMulti-View LearningCross-View AttentionDronesRoboticsComputer VisionVideo UnderstandingOptical FlowSegmentationAutonomous Swarms

Academic Context

#Computer Vision#Robotics#Autonomous Systems#Video Analysis#Multi-modal Learning

Commercial Potential

Potential Products

MAV control systems with advanced perceptionAutonomous drone fleet management software

Target Industries

AerospaceDefenseLogisticsAgricultureSurveillance

Use Case Examples

Enabling drones to identify specific actions or maneuvers for cooperative tasksImproving autonomous navigation and control in complex aerial environments

Competitive Edge

A novel multi-view approach with cross-view attention specifically designed for the challenging task of MAV action recognition.

Resource Requirements

Data Requirements

Requires datasets of MAVs performing various actions, captured with RGB, optical flow, and segmentation information.

Deployment Constraints

Onboard computational limitations for real-time processing on MAVs; data transmission bandwidth.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers