Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper ASR Developers,Machine Learning Engineers,AI Fairness Researchers,Speech Technology Companies 2 weeks ago

Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning

speech-audio β€Ί speech-recognition
πŸ“„ Abstract

Abstract: In this work, we address the challenge of building fair English ASR systems for second-language speakers. Our analysis of widely used ASR models, Whisper and Seamless-M4T, reveals large fluctuations in word error rate (WER) across 26 accent groups, indicating significant fairness gaps. To mitigate this, we propose fairness-prompted finetuning with lightweight adapters, incorporating Spectral Decoupling (SD), Group Distributionally Robust Optimization (Group-DRO), and Invariant Risk Minimization (IRM). Our proposed fusion of traditional empirical risk minimization (ERM) with cross-entropy and fairness-driven objectives (SD, Group DRO, and IRM) enhances fairness across accent groups while maintaining overall recognition accuracy. In terms of macro-averaged word error rate, our approach achieves a relative improvement of 58.7% and 58.5% over the large pretrained Whisper and SeamlessM4T, and 9.7% and 7.8% over them, finetuning with standard empirical risk minimization with cross-entropy loss.
Authors (6)
Monorama Swain
Bubai Maji
Jagabandhu Mishra
Markus Schedl
Anders SΓΈgaard
Jesper Rindom Jensen
Submitted
October 21, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper proposes a fairness-prompted finetuning method using lightweight adapters to build fairer English ASR systems for second-language speakers. By combining traditional ERM with fairness objectives (SD, Group-DRO, IRM), the approach significantly reduces Word Error Rate (WER) disparities across 26 accent groups while maintaining overall accuracy, outperforming standard finetuning.

Business Value

Enables the development of more inclusive and equitable voice-enabled technologies, expanding market reach to global users and improving user experience for non-native speakers in applications like customer service, virtual assistants, and dictation software.