Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Intra-sentence multilingual speech synthesis (code-switching TTS) remains a
major challenge due to abrupt language shifts, varied scripts, and mismatched
prosody between languages. Conventional TTS systems are typically monolingual
and fail to produce natural, intelligible speech in mixed-language contexts. We
introduce Script-First Multilingual Synthesis with Adaptive Locale Resolution
(SFMS-ALR), an engine-agnostic framework for fluent, real-time code-switched
speech generation. SFMS-ALR segments input text by Unicode script, applies
adaptive language identification to determine each segment's language and
locale, and normalizes prosody using sentiment-aware adjustments to preserve
expressive continuity across languages. The algorithm generates a unified SSML
representation with appropriate "lang" or "voice" spans and synthesizes the
utterance in a single TTS request. Unlike end-to-end multilingual models,
SFMS-ALR requires no retraining and integrates seamlessly with existing voices
from Google, Apple, Amazon, and other providers. Comparative analysis with
data-driven pipelines such as Unicom and Mask LID demonstrates SFMS-ALR's
flexibility, interpretability, and immediate deployability. The framework
establishes a modular baseline for high-quality, engine-independent
multilingual TTS and outlines evaluation strategies for intelligibility,
naturalness, and user preference.
Submitted
October 27, 2025
Key Contributions
Introduces SFMS-ALR, an engine-agnostic framework for fluent, real-time code-switched speech synthesis that segments text by script, adaptively identifies language/locale, and normalizes prosody with sentiment-aware adjustments. This approach avoids retraining existing TTS systems and integrates seamlessly.
Business Value
Enables more natural and engaging voice interactions for global audiences, improving customer experiences in multilingual support, content localization, and virtual assistants.