arxiv_ai 85% Match Research Paper ML researchers,Audio engineers,Embedded systems developers,Signal processing engineers 1 week ago

Compressing Quaternion Convolutional Neural Networks for Audio Classification

speech-audio › audio-generation

📄 Abstract

Abstract: Conventional Convolutional Neural Networks (CNNs) in the real domain have been widely used for audio classification. However, their convolution operations process multi-channel inputs independently, limiting the ability to capture correlations among channels. This can lead to suboptimal feature learning, particularly for complex audio patterns such as multi-channel spectrogram representations. Quaternion Convolutional Neural Networks (QCNNs) address this limitation by employing quaternion algebra to jointly capture inter-channel dependencies, enabling more compact models with fewer learnable parameters while better exploiting the multi-dimensional nature of audio signals. However, QCNNs exhibit higher computational complexity due to the overhead of quaternion operations, resulting in increased inference latency and reduced efficiency compared to conventional CNNs, posing challenges for deployment on resource-constrained platforms. To address this challenge, this study explores knowledge distillation (KD) and pruning, to reduce the computational complexity of QCNNs while maintaining performance. Our experiments on audio classification reveal that pruning QCNNs achieves similar or superior performance compared to KD while requiring less computational effort. Compared to conventional CNNs and Transformer-based architectures, pruned QCNNs achieve competitive performance with a reduced learnable parameter count and computational complexity. On the AudioSet dataset, pruned QCNNs reduce computational cost by 50\% and parameter count by 80\%, while maintaining performance comparable to the conventional CNNs. Furthermore, pruned QCNNs generalize well across multiple audio classification benchmarks, including GTZAN for music genre recognition, ESC-50 for environmental sound classification and RAVDESS for speech emotion recognition.

Authors (3)

Arshdeep Singh

Vinayak Abrol

Mark D. Plumbley

Submitted

October 24, 2025

arXiv Category

eess.AS

arXiv PDF

Key Contributions

Addresses the challenge of high computational complexity in Quaternion Convolutional Neural Networks (QCNNs) for audio classification. The study explores methods to compress QCNNs, aiming to make them more efficient and suitable for deployment on resource-constrained platforms while retaining their ability to capture inter-channel dependencies.

Business Value

Enables the deployment of more powerful audio analysis models on edge devices and embedded systems, leading to more intelligent and responsive audio applications in various consumer electronics and IoT devices.

Paper Metadata

Innovation Type

Methodological

Deployment Feasibility

Aims to improve feasibility for resource-constrained platforms.

Limitations Addressed

Limited inter-channel correlation capture in real-domain CNNs,High computational overhead and latency of QCNNs,Deployment difficulties on edge devices

Technical Tags

Quaternion Convolutional Neural Networks (QCNNs)audio classificationcompressionquaternion algebrainter-channel dependenciesfeature learningcomputational complexityresource-constrained platforms

Research Topics

Audio ProcessingDeep LearningModel CompressionSignal ProcessingMachine Learning

Methods & Architectures

Quaternion convolutionModel compression techniques Quaternion Convolutional Neural Networks (QCNNs)Convolutional Neural Networks (CNNs)

Applications & Tasks

Audio analysis Speech processing Signal processing Suboptimal feature learning in CNNs for multi-channel audioHigh computational complexity of QCNNsDeployment challenges on resource-constrained devices Audio classificationModel compressionEfficient QCNNs

Related Fields

Signal ProcessingMachine LearningDeep LearningComputer Engineering

Keywords

QCNNaudio classificationmodel compressionquaternionCNNdeep learningfeature learningcomputational efficiencyresource-constrainedsignal processing

Academic Context

#Audio Processing#Deep Learning#Model Compression#Signal Processing#Machine Learning

Commercial Potential

Potential Products

On-device audio recognition systemsEfficient audio processing modules for IoT devices

Target Industries

Consumer ElectronicsAutomotiveTelecommunicationsIoT

Use Case Examples

Keyword spotting on smart speakersNoise classification in industrial settingsSound event detection in mobile applications

Competitive Edge

Offers a way to leverage the benefits of QCNNs (better inter-channel correlation) while mitigating their primary drawback (computational complexity) through compression.

Resource Requirements

Compute Needs

Focuses on reducing compute requirements for inference.

Data Requirements

Audio datasets for classification tasks.

Deployment Constraints

Limited memory and processing power on target devices.

Scalability

The goal is to improve scalability to resource-constrained environments.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers