Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Enhances the naturalness of LLM-generated utterances for speech synthesis by inserting disfluencies. The approach involves fine-tuning an LLM with LoRA to incorporate disfluencies and then synthesizing these utterances using a TTS model, significantly increasing perceived spontaneity in user studies.
Improves the user experience for voice-based AI systems (e.g., chatbots, virtual assistants) by making their speech sound more human-like and engaging.