arxiv_ml 95% Match Research Paper ASR Researchers,NLP Engineers,Speech Technologists,ML Developers 2 weeks ago

Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

speech-audio › speech-recognition

📄 Abstract

Abstract: Recent work has shown that sample-based Minimum Bayes Risk (MBR) decoding outperforms beam search in text-to-text generation tasks, such as machine translation, text summarization, and image captioning. On the other hand, beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST). Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for offline ASR and ST tasks that require high accuracy. The code is available at https://github.com/CyberAgentAILab/mbr-for-asr

Authors (1)

Yuu Jinnai

Submitted

October 22, 2025

arXiv Category

cs.CL

arXiv PDF Code

Key Contributions

Evaluates Minimum Bayes Risk (MBR) decoding for Automatic Speech Recognition (ASR) and Speech Translation (ST) tasks, demonstrating that it outperforms traditional beam search in accuracy across various settings. This suggests MBR is a promising alternative for high-accuracy offline speech processing.

Business Value

Leads to more accurate transcription and translation services, improving user experience and reliability for voice-enabled applications and content localization.

Paper Metadata

Innovation Type

Algorithmic Evaluation

Deployment Feasibility

High, as the method is evaluated on existing models (Whisper) and code is available, facilitating adoption.

Limitations Addressed

Beam search is the standard but potentially suboptimal decoding method for ASR/ST,Lack of empirical evidence for MBR's effectiveness in speech tasks

Performance Gains

MBR decoding outperforms beam search in accuracy for ASR and ST tasks in most evaluated experimental settings.

View Code on GitHub

Technical Tags

Automatic Speech Recognition (ASR)Minimum Bayes Risk (MBR) DecodingBeam SearchSpeech Translation (ST)Text-to-Text GenerationSequence DecodingWhisper ModelAccuracy ImprovementOffline ASRCode Availability

Research Topics

Speech ProcessingNatural Language ProcessingSequence ModelingDecoding Algorithms

Methods & Architectures

Minimum Bayes Risk (MBR) DecodingEvaluation on ASR and ST tasksComparison with Beam Search WhisperWhisper derivative models

Applications & Tasks

Speech Recognition Machine Translation Voice Assistants Transcription Services Improving accuracy of ASR and STEvaluating MBR decoding for speech tasksReplacing beam search with a potentially superior decoding method Automatic Speech Recognition (ASR)Speech Translation (ST)

Related Fields

Speech ProcessingNatural Language ProcessingMachine LearningInformation Theory

Keywords

ASRSpeech RecognitionMBR DecodingMinimum Bayes RiskBeam SearchSpeech TranslationSequence DecodingWhisperAccuracyOffline ASRNLPSpeech ProcessingDecoding Algorithms

Academic Context

#Speech Processing#Natural Language Processing#Sequence Modeling#Decoding Algorithms

Companies & Organizations

Companies Mentioned

CyberAgent

Commercial Potential

Potential Products

High-accuracy ASR systemsImproved speech translation servicesMore reliable voice command interfaces

Target Industries

TechnologyMediaTelecommunicationsCustomer ServiceHealthcare

Use Case Examples

Accurate transcription of legal or medical dictationsReal-time speech translation for international conferencesImproving voice control accuracy in noisy environments

Competitive Edge

Presents MBR decoding as a superior alternative to beam search for high-accuracy ASR and ST tasks.

Resource Requirements

Compute Needs

Requires compute for running Whisper models and MBR decoding.

Data Requirements

Requires speech data for ASR/ST tasks (English, Japanese).

Deployment Constraints

MBR decoding might be computationally more intensive than beam search, impacting real-time performance in some scenarios.

Scalability

Scalability depends on the underlying ASR/ST model and the efficiency of the MBR implementation.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers