Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 98% Match Research Paper Speech AI researchers,Developers of voice assistants,Accessibility advocates,Clinicians 2 weeks ago

VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency

speech-audio › speech-recognition
📄 Abstract

Abstract: While Speech Large Language Models (Speech-LLMs) show strong performance in many applications, their robustness is critically under-tested, especially to speech disfluency. Existing evaluations often rely on idealized inputs, overlooking common disfluencies, particularly those associated with conditions like Parkinson's disease. This work investigates whether current Speech-LLMs can maintain performance when interacting with users who have speech impairments. To facilitate this inquiry, we introduce VocalBench-DF, a framework for the systematic evaluation of disfluency across a multi-dimensional taxonomy. Our evaluation of 22 mainstream Speech-LLMs reveals substantial performance degradation, indicating that their real-world readiness is limited. Further analysis identifies phoneme-level processing and long-context modeling as primary bottlenecks responsible for these failures. Strengthening recognition and reasoning capability from components and pipelines can substantially improve robustness. These findings highlight the urgent need for new methods to improve disfluency handling and build truly inclusive Speech-LLMs
Authors (6)
Hongcheng Liu
Yixuan Hou
Heyang Liu
Yuhao Wang
Yanfeng Wang
Yu Wang
Submitted
October 17, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper introduces VocalBench-DF, a benchmark for evaluating Speech LLM robustness to disfluency, particularly for users with speech impairments like Parkinson's disease. Evaluations of 22 mainstream Speech-LLMs reveal substantial performance degradation, identifying phoneme-level processing and long-context modeling as key bottlenecks, limiting their real-world readiness.

Business Value

Ensures that voice-enabled technologies are accessible and reliable for all users, including those with speech impairments, promoting inclusivity and expanding market reach.