Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper ASR Researchers,NLP Engineers,Speech Technologists,ML Developers 2 weeks ago

Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

speech-audio › speech-recognition
📄 Abstract

Abstract: Recent work has shown that sample-based Minimum Bayes Risk (MBR) decoding outperforms beam search in text-to-text generation tasks, such as machine translation, text summarization, and image captioning. On the other hand, beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST). Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for offline ASR and ST tasks that require high accuracy. The code is available at https://github.com/CyberAgentAILab/mbr-for-asr
Authors (1)
Yuu Jinnai
Submitted
October 22, 2025
arXiv Category
cs.CL
arXiv PDF Code

Key Contributions

Evaluates Minimum Bayes Risk (MBR) decoding for Automatic Speech Recognition (ASR) and Speech Translation (ST) tasks, demonstrating that it outperforms traditional beam search in accuracy across various settings. This suggests MBR is a promising alternative for high-accuracy offline speech processing.

Business Value

Leads to more accurate transcription and translation services, improving user experience and reliability for voice-enabled applications and content localization.

View Code on GitHub