arxiv_ml 95% Match Research Paper ML Engineers,Embedded Systems Developers,AI Researchers 3 weeks ago

Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models

speech-audio › speech-recognition

📄 Abstract

Abstract: Whisper models have achieved remarkable progress in speech recognition; yet their large size remains a bottleneck for deployment on resource-constrained edge devices. This paper proposes a framework to design fine-tuned variants of Whisper which address the above problem. Structured sparsity is enforced via the Sparse Group LASSO penalty as a loss regularizer, to reduce the number of FLOating Point operations (FLOPs). Further, a weight statistics aware pruning algorithm is proposed. We also design our custom text normalizer for WER evaluation. On Common Voice 11.0 Hindi dataset, we obtain, without degrading WER, (a) 35.4% reduction in model parameters, 14.25% lower memory consumption and 18.5% fewer FLOPs on Whisper-small, and (b) 31% reduction in model parameters, 15.29% lower memory consumption and 16.95% fewer FLOPs on Whisper-medium; and, (c) substantially outperform the state-of-the-art Iterative Magnitude Pruning based method by pruning 18.7% more parameters along with a 12.31 reduction in WER.

Authors (4)

Prasenjit K Mudi

Anshi Sachan

Dahlia Devapriya

Sheetal Kalyani

Submitted

October 14, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Proposes a framework for designing fine-tuned, efficient variants of Whisper models for edge devices. It employs structured sparsity via Sparse Group LASSO and a weight statistics aware pruning algorithm to significantly reduce model parameters, memory, and FLOPs without degrading WER, outperforming state-of-the-art pruning methods.

Business Value

Enables the deployment of advanced speech recognition capabilities on low-power devices, opening up new applications in voice assistants, real-time transcription, and accessibility tools for edge environments.

Paper Metadata

Innovation Type

Algorithmic and Optimization

Deployment Feasibility

High, specifically targets edge device deployment.

Limitations Addressed

Large size of Whisper models, which is a bottleneck for deployment on resource-constrained edge devices.

Performance Gains

On Whisper-small: 35.4% parameter reduction, 14.25% lower memory, 18.5% fewer FLOPs. On Whisper-medium: 31% parameter reduction, 15.29% lower memory, 16.95% fewer FLOPs. Outperforms SOTA pruning by 18.7% more parameters pruned.

Technical Tags

Whisper modelsstructured sparsityweight pruningmodel compressionedge devicesFLOPs reductionmemory efficiencyWER optimizationSparse Group LASSO

Research Topics

Model CompressionSpeech RecognitionEfficient AIEdge ComputingModel Optimization

Methods & Architectures

Structured sparsity enforcementSparse Group LASSO penaltyWeight statistics aware pruningCustom text normalizer Whisper (small, medium)

Applications & Tasks

Speech Recognition Edge AI Mobile Devices Internet of Things (IoT) Model Size ReductionComputational EfficiencyMemory Footprint Reduction Making Whisper models deployable on resource-constrained edge devicesReducing model parameters, memory consumption, and FLOPsMaintaining or improving Word Error Rate (WER)

Datasets & Benchmarks

Datasets

Common Voice 11.0 Hindi

Benchmarks

WER on Common Voice 11.0 Hindi

Word Error Rate (WER)model parametersmemory consumptionFLOPs

Related Fields

Speech ProcessingDeep Learning OptimizationComputer ArchitectureEmbedded Systems

Keywords

Whispermodel compressionstructured sparsitypruningedge AIspeech recognitionefficient deep learningFLOPsmemory reductionSparse Group LASSO

Academic Context

#Model Compression#Speech Recognition#Efficient AI#Edge Computing#Model Optimization

Commercial Potential

Potential Products

Optimized Whisper models for embedded devicesSpeech recognition SDKs for edge computing

Target Industries

Consumer ElectronicsAutomotiveIoTTelecommunications

Use Case Examples

On-device voice commands for smart home devices.Real-time transcription for meeting recording on mobile phones.Voice control for automotive infotainment systems.

Competitive Edge

Achieves significant model compression for Whisper models specifically targeting edge deployment, with competitive or improved accuracy compared to other pruning techniques.

Market Opportunity

Large and growing market for edge AI and speech technologies.

Revenue Models

Licensing of optimized modelsintegration into hardware products.

Resource Requirements

Compute Needs

Moderate (for fine-tuning/pruning)

Data Requirements

Labeled speech data (e.g., Common Voice) for fine-tuning and evaluation.

Deployment Constraints

Limited by the specific hardware capabilities of edge devices.

Scalability

The compression techniques are designed to improve scalability to edge devices.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Medium

View Full Paper Back to Papers