Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Mathematical reasoning remains one of the most challenging domains for large
language models (LLMs), requiring not only linguistic understanding but also
structured logical deduction and numerical precision. While recent LLMs
demonstrate strong general-purpose reasoning abilities, their mathematical
competence across diverse languages remains underexplored. Existing benchmarks
primarily focus on English or a narrow subset of high-resource languages,
leaving significant gaps in assessing multilingual and cross-lingual
mathematical reasoning. To address this, we introduce MathMist, a parallel
multilingual benchmark for mathematical problem solving and reasoning. MathMist
encompasses over 21K aligned question-answer pairs across seven languages,
representing a balanced coverage of high-, medium-, and low-resource linguistic
settings. The dataset captures linguistic variety, multiple types of problem
settings, and solution synthesizing capabilities. We systematically evaluate a
diverse suite of models, including open-source small and medium LLMs,
proprietary systems, and multilingual-reasoning-focused models, under
zero-shot, chain-of-thought (CoT), and code-switched reasoning paradigms. Our
results reveal persistent deficiencies in LLMs' ability to perform consistent
and interpretable mathematical reasoning across languages, with pronounced
degradation in low-resource settings. All the codes and data are available at
GitHub: https://github.com/mahbubhimel/MathMist
Authors (5)
Mahbub E Sobhani
Md. Faiyaz Abdullah Sayeedi
Tasnim Mohiuddin
Md Mofijul Islam
Swakkhar Shatabda
Submitted
October 16, 2025
Key Contributions
MathMist introduces a parallel multilingual benchmark dataset for mathematical problem solving and reasoning, comprising over 21K aligned question-answer pairs across seven languages. It addresses the gap in evaluating LLMs' mathematical capabilities across diverse linguistic settings, including low-resource languages.
Business Value
Provides a crucial resource for advancing AI's capabilities in mathematical reasoning, enabling the development of more capable educational tools, research assistants, and problem-solving AI systems across different languages.