Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Evaluates the performance of four Spoken Language Models (SLMs) on speech emotion recognition using emotionally incongruent speech. The study reveals that SLMs predominantly rely on textual semantics rather than acoustic cues for emotion recognition, indicating a dominance of text-related representations.
Provides critical insights into the limitations of current spoken language models for understanding nuanced human emotions, guiding the development of more robust and truly multimodal AI systems for applications like customer service or mental health monitoring.