Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 90% Match Research Paper NLP researchers,Developers working with African languages,Linguists 1 week ago

AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages

large-language-models › evaluation
📄 Abstract

Abstract: Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing hallucinations in LLMs. Despite the recent release of massively multilingual MTEB (MMTEB), African languages remain underrepresented, with existing tasks often repurposed from translation benchmarks such as FLORES clustering or SIB-200. In this paper, we introduce AfriMTEB -- a regional expansion of MMTEB covering 59 languages, 14 tasks, and 38 datasets, including six newly added datasets. Unlike many MMTEB datasets that include fewer than five languages, the new additions span 14 to 56 African languages and introduce entirely new tasks, such as hate speech detection, intent detection, and emotion classification, which were not previously covered. Complementing this, we present AfriE5, an adaptation of the instruction-tuned mE5 model to African languages through cross-lingual contrastive distillation. Our evaluation shows that AfriE5 achieves state-of-the-art performance, outperforming strong baselines such as Gemini-Embeddings and mE5.
Authors (3)
Kosei Uemura
Miaoran Zhang
David Ifeoluwa Adelani
Submitted
October 27, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper introduces AfriMTEB, a comprehensive benchmark for African languages, addressing the underrepresentation in existing multilingual datasets. It also presents AfriE5, an adapted text embedding model trained via cross-lingual distillation, showing improved performance for these languages.

Business Value

Enables the development of NLP applications and services that are more inclusive and effective for speakers of African languages, opening up new markets and improving accessibility.