Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper LLM Researchers,NLP Engineers,Linguists 2 weeks ago

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

large-language-models › evaluation
📄 Abstract

Abstract: Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.
Authors (2)
Emily Chang
Niyati Bafna
Submitted
October 19, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces ChiKhaPo, a large-scale multilingual benchmark covering over 2700 languages for evaluating lexical comprehension and generation in LLMs. This benchmark addresses the critical gap in evaluating LLMs on low-resource languages, aiming to improve their basic linguistic competence across a wider range of the world's languages.

Business Value

Enables development of LLMs that are more equitable and functional across a much wider range of global languages, opening up new markets and applications for AI in diverse linguistic communities.