arxiv_ai 91% Match Research Paper NLP Researchers,ML Researchers,Developers of multilingual AI systems 2 weeks ago

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

large-language-models › evaluation

📄 Abstract

Abstract: Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried from target languages. Prior research has pointed to a cross-lingual gap, viz., a drop in accuracy when the knowledge is queried in a target language compared to when the query is in the source language. Existing research has rationalized divergence in latent representations in source and target languages as the source of cross-lingual gap. In this work, we take an alternative view and hypothesize that the variance of responses in the target language is the main cause of this gap. For the first time, we formalize the cross-lingual gap in terms of bias-variance decomposition. We present extensive experimental evidence which support proposed formulation and hypothesis. We then reinforce our hypothesis through multiple inference-time interventions that control the variance and reduce the cross-lingual gap. We demonstrate a simple prompt instruction to reduce the response variance, which improved target accuracy by 20-25% across different models.

Authors (5)

Vihari Piratla

Purvam Jain

Darshan Singh

Partha Talukdar

Trevor Cohn

Submitted

October 17, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper re-examines the cross-lingual gap in LLMs from a statistical perspective, hypothesizing that response variance, rather than latent representation divergence, is the main cause. It formalizes the gap using bias-variance decomposition and provides extensive experimental evidence supporting this hypothesis, offering a new framework for understanding and potentially mitigating cross-lingual performance differences.

Business Value

A better statistical understanding of cross-lingual gaps can lead to more equitable and reliable LLM performance across different languages, crucial for global applications and accessibility.

Paper Metadata

Innovation Type

Theoretical Reframing and Statistical Formalization

Deployment Feasibility

The statistical framework provides insights for improving model evaluation and potentially guiding future model development for better cross-lingual capabilities.

Limitations Addressed

Existing rationalizations of the cross-lingual gap focusing solely on latent representation divergence.

Technical Tags

cross-lingual gapbias-variance decompositionstatistical viewpointLLMsknowledge acquisitionlatent representationsresponse varianceinference-timesource languagetarget language

Research Topics

Cross-Lingual Transfer in LLMsStatistical Analysis of Language Model PerformanceUnderstanding Cross-Lingual GapsImproving LLM Performance Across Languages

Methods & Architectures

Formalization of Cross-Lingual GapBias-Variance DecompositionExtensive Experimental EvidenceInference-Time Analysis Large Language Models (LLMs)

Applications & Tasks

Natural Language Processing Machine Translation Cross-Lingual AI Accuracy drop in target languages (cross-lingual gap)Rationalizing divergence in latent representationsUnderstanding variance in model responses Evaluating Cross-Lingual PerformanceAnalyzing LLM Behavior Across LanguagesImproving Cross-Lingual Transfer

Related Fields

StatisticsLinguisticsMachine Learning Evaluation

Keywords

cross-lingual gapLLMstatisticalbias-variancevariancesource languagetarget languagerepresentationaccuracyinferenceknowledgemultilingual

Academic Context

#Cross-Lingual Transfer in LLMs#Statistical Analysis of Language Model Performance#Understanding Cross-Lingual Gaps#Improving LLM Performance Across Languages

Commercial Potential

Potential Products

Multilingual LLM evaluation toolsFrameworks for improving cross-lingual transfer

Target Industries

TechnologyGlobal CommunicationsResearch

Use Case Examples

Assessing the true performance of LLMs in non-English languagesDeveloping strategies to reduce performance disparities between languages

Competitive Edge

Offers a novel statistical perspective on the cross-lingual gap, challenging existing explanations and providing a new analytical lens.

Resource Requirements

Compute Needs

Moderate to High (for extensive experiments)

Data Requirements

Large multilingual text corpora

Scalability

The statistical framework is general and can be applied to various LLMs and language pairs.

Production Readiness

Maturity Level

Theoretical and Empirical Research

View Full Paper Back to Papers