Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper NLP researchers,Speech processing engineers,Developers of dialogue systems,AI researchers 5 days ago

Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking

speech-audio › speech-recognition
📄 Abstract

Abstract: Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.
Authors (4)
Jihyun Lee
Solee Im
Wonjun Lee
Gary Geunbae Lee
Submitted
September 10, 2024
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces 'Speak & Spell', an LLM-driven data augmentation method for improving Dialogue State Tracking (DST) robustness against Automatic Speech Recognition (ASR) errors. The method uses LLMs to control the placement of phonetically similar errors on keywords via prompts, generating sufficient error patterns to significantly improve DST accuracy in noisy ASR environments.

Business Value

Enhances the reliability and user experience of voice-based applications (e.g., virtual assistants, call center bots) by making them more resilient to speech recognition inaccuracies.