Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recent speech-to-speech (S2S) models generate intelligible speech but still
lack natural expressiveness, largely due to the absence of a reliable
evaluation metric. Existing approaches, such as subjective MOS ratings,
low-level acoustic features, and emotion recognition are costly, limited, or
incomplete. To address this, we present DeEAR (Decoding the Expressive
Preference of eAR), a framework that converts human preference for speech
expressiveness into an objective score. Grounded in phonetics and psychology,
DeEAR evaluates speech across three dimensions: Emotion, Prosody, and
Spontaneity, achieving strong alignment with human perception (Spearman's Rank
Correlation Coefficient, SRCC = 0.86) using fewer than 500 annotated samples.
Beyond reliable scoring, DeEAR enables fair benchmarking and targeted data
curation. It not only distinguishes expressiveness gaps across S2S models but
also selects 14K expressive utterances to form ExpressiveSpeech, which improves
the expressive score (from 2.0 to 23.4 on a 100-point scale) of S2S models.
Demos and codes are available at
https://github.com/FreedomIntelligence/ExpressiveSpeech
Authors (6)
Zhiyu Lin
Jingwen Yang
Jiale Zhao
Meng Liu
Sunzhu Li
Benyou Wang
Submitted
October 23, 2025
Key Contributions
DeEAR is a novel framework that converts human preference for speech expressiveness into an objective score, overcoming the limitations of costly subjective MOS ratings and incomplete low-level acoustic features. By grounding its evaluation in phonetics and psychology across three dimensions (Emotion, Prosody, Spontaneity), DeEAR achieves strong alignment with human perception (SRCC=0.86) with minimal annotated samples, enabling fair benchmarking and targeted data curation for improved speech synthesis.
Business Value
Enables the development of more natural and engaging synthetic voices for applications like virtual assistants, audiobooks, and customer service, leading to improved user experience and engagement.