Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research paper Speech technologists,Linguists,AI researchers,Community organizers,Ethicists 3 weeks ago

Quechua Speech Datasets in Common Voice: The Case of Puno Quechua

speech-audio › speech-recognition
📄 Abstract

Abstract: Under-resourced languages, such as Quechuas, face data and resource scarcity, hindering their development in speech technology. To address this issue, Common Voice presents a crucial opportunity to foster an open and community-driven speech dataset creation. This paper examines the integration of Quechua languages into Common Voice. We detail the current 17 Quechua languages, presenting Puno Quechua (ISO 639-3: qxp) as a focused case study that includes language onboarding and corpus collection of both reading and spontaneous speech data. Our results demonstrate that Common Voice now hosts 191.1 hours of Quechua speech (86\% validated), with Puno Quechua contributing 12 hours (77\% validated), highlighting the Common Voice's potential. We further propose a research agenda addressing technical challenges, alongside ethical considerations for community engagement and indigenous data sovereignty. Our work contributes towards inclusive voice technology and digital empowerment of under-resourced language communities.
Authors (4)
Elwin Huaman
Wendi Huaman
Jorge Luis Huaman
Ninfa Quispe
Submitted
October 13, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper addresses the critical data scarcity issue for under-resourced languages like Quechua by detailing the integration of Puno Quechua into the Common Voice dataset. It highlights the potential of community-driven initiatives for speech technology development and proposes a research agenda for technical and ethical challenges, including indigenous data sovereignty.

Business Value

Enabling speech technology for previously underserved linguistic communities can unlock new markets and applications, fostering digital inclusion and economic opportunities for these populations.