arxiv_ai 95% Match Research Paper AI researchers,Speech processing engineers,Linguists,Developers of affective computing systems 2 days ago

Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features

speech-audio › speech-recognition

📄 Abstract

Abstract: Speech Emotion Recognition (SER) is a key affective computing technology that enables emotionally intelligent artificial intelligence. While SER is challenging in general, it is particularly difficult for low-resource languages such as Urdu. This study investigates Urdu SER in a cross-corpus setting, an area that has remained largely unexplored. We employ a cross-corpus evaluation framework across three different Urdu emotional speech datasets to test model generalization. Two standard domain-knowledge based acoustic feature sets, eGeMAPS and ComParE, are used to represent speech signals as feature vectors which are then passed to Logistic Regression and Multilayer Perceptron classifiers. Classification performance is assessed using unweighted average recall (UAR) whilst considering class-label imbalance. Results show that Self-corpus validation often overestimates performance, with UAR exceeding cross-corpus evaluation by up to 13%, underscoring that cross-corpus evaluation offers a more realistic measure of model robustness. Overall, this work emphasizes the importance of cross-corpus validation for Urdu SER and its implications contribute to advancing affective computing research for underrepresented language communities.

Authors (4)

Unzela Talpur

Zafi Sherhan Syed

Muhammad Shehram Shah Syed

Abbas Shah Syed

Submitted

October 28, 2025

arXiv Category

cs.SD

arXiv PDF

Key Contributions

This paper investigates the challenging area of cross-corpus validation for Speech Emotion Recognition (SER) in Urdu, a low-resource language. It highlights that self-corpus validation can significantly overestimate performance, with UAR exceeding cross-corpus evaluation by up to 13%, underscoring the importance of robust cross-corpus evaluation for reliable SER models.

Business Value

Enables the development of more reliable emotionally intelligent AI systems for Urdu-speaking populations, improving applications in areas like mental health support and personalized user experiences.

Paper Metadata

Innovation Type

Methodology

Deployment Feasibility

Feasible, as it uses standard acoustic features and common ML classifiers. The focus on cross-corpus validation improves reliability for real-world deployment.

Limitations Addressed

Overestimation of performance due to dataset-specific biases in SER models, particularly for low-resource languages.

Performance Gains

Up to 13% difference in UAR between self-corpus and cross-corpus validation.

Technical Tags

Speech Emotion RecognitionUrduCross-corpus validationAcoustic featureseGeMAPSComParELogistic RegressionMultilayer PerceptronUnweighted Average Recall (UAR)Low-resource languages

Research Topics

Affective ComputingSpeech ProcessingMachine Learning for Low-Resource LanguagesModel GeneralizationFeature Engineering

Methods & Architectures

Cross-corpus evaluationFeature extraction (eGeMAPS, ComParE)Logistic RegressionMultilayer Perceptron Logistic RegressionMultilayer Perceptron

Applications & Tasks

Human-Computer Interaction Mental Health Monitoring Customer Service Emotion RecognitionGeneralization across datasetsPerformance estimation Speech Emotion Recognition in Urdu

Datasets & Benchmarks

Datasets

Three different Urdu emotional speech datasets

Unweighted Average Recall (UAR)

Related Fields

Natural Language ProcessingMachine LearningLinguisticsPsychology

Keywords

Speech Emotion RecognitionUrduCross-corpusValidationAcoustic FeatureseGeMAPSComParELogistic RegressionMultilayer PerceptronUARLow-resourceAffective ComputingGeneralizationPerformance Estimation

Academic Context

#Affective Computing#Speech Processing#Machine Learning for Low-Resource Languages#Model Generalization#Feature Engineering

Commercial Potential

Potential Products

Emotionally aware chatbotsCall center analytics toolsMental health monitoring applications

Target Industries

TechnologyHealthcareCustomer ServiceMedia

Use Case Examples

Analyzing customer sentiment in call centersDeveloping empathetic AI companionsAssessing emotional states in therapeutic settings

Competitive Edge

Focuses on the critical but underexplored aspect of cross-corpus validation for Urdu SER, aiming to provide more realistic performance estimates than single-dataset evaluations.

Market Opportunity

Growing market for affective computing and AI-driven personalization.

Revenue Models

Licensing of modelsAPI accessdevelopment of specialized applications.

Resource Requirements

Compute Needs

Moderate, typical for training ML classifiers on feature vectors.

Data Requirements

Labeled speech data in Urdu, ideally from multiple distinct corpora.

Deployment Constraints

Requires access to speech data and computational resources for inference. Performance may vary depending on the target domain's acoustic conditions.

Scalability

The feature extraction and classification methods are generally scalable. Cross-corpus validation itself requires multiple datasets, which can be a scaling challenge.

Regulatory Considerations

Data privacy concerns related to voice recordings and emotional content.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years for robust product integration.

View Full Paper Back to Papers