arxiv_cl 95% Match Research Paper ML Researchers,NLP Engineers,Developers of Multilingual AI,Researchers in Low-Resource Languages 2 weeks ago

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

large-language-models › training-methods

📄 Abstract

Abstract: Large language models exhibit uneven performance across languages, with substantial gaps between high- and low-resource languages. We present a framework for enhancing monolingual capabilities of LLMs in underrepresented languages while preserving their general-purpose performance through targeted fine-tuning of language-specific subnetworks. Our approach identifies language-specific neurons using Language Activation Probability Entropy and fine-tunes only the weights associated with these neurons, a dedicated subnetwork, on target-language data. Experiments on Llama-3.1-8B and Mistral-Nemo-12B across 12 mid- and low-resource languages demonstrate that our method consistently outperforms full fine-tuning, FFN-only fine-tuning, LoRA adaptation, and random subset fine-tuning baselines while efficiently updating only up to 1% of model parameters. Beyond performance improvements, we observe enhanced favorable training dynamics, cross-lingual representational alignment, and systematic weight update changes. To facilitate future research, we release language-specific neuron identifications for over 100 languages as well as our adaptation pipeline, offering a cost-effective pathway for adapting state-of-the-art models to underrepresented languages.

Key Contributions

This paper proposes a framework for enhancing LLM performance in underrepresented languages by fine-tuning sparse subnetworks identified using Language Activation Probability Entropy. This method outperforms full fine-tuning and other PEFT methods, updating only up to 1% of parameters while preserving general capabilities and improving training dynamics and cross-lingual alignment.

Business Value

Enables the development of more equitable and globally accessible AI technologies by improving LLM performance for underrepresented languages, unlocking new markets and applications.

Paper Metadata

Innovation Type

Parameter-Efficient Fine-Tuning Method

Deployment Feasibility

High. The method is parameter-efficient, making it feasible to adapt large models to new languages with limited resources.

Limitations Addressed

Substantial performance gaps between high- and low-resource languages in LLMs, and the inefficiency (computational and data) of full fine-tuning for improving performance on specific languages.

Performance Gains

Consistently outperforms full fine-tuning, FFN-only fine-tuning, LoRA, and random subset fine-tuning.

Technical Tags

Sparse subnetworkUnderrepresented languagesLLM fine-tuningLanguage-specific neuronsLanguage Activation Probability EntropyParameter efficiencyLow-resource languagesCross-lingual alignmentLlama-3.1-8BMistral-Nemo-12B

Research Topics

Low-Resource NLPLLM Fine-tuningParameter-Efficient Fine-Tuning (PEFT)Cross-Lingual TransferLanguage Representation

Methods & Architectures

Sparse subnetwork identificationLanguage Activation Probability EntropyTargeted fine-tuning of language-specific neuronsFull fine-tuningFFN-only fine-tuningLoRA adaptation Llama-3.1-8BMistral-Nemo-12BTransformer

Applications & Tasks

Natural Language Processing Multilingual AI Low-Resource Language Technologies Uneven LLM performance across languagesGaps between high- and low-resource languagesInefficiency of full fine-tuning Enhancing monolingual capabilities in underrepresented languagesPreserving general-purpose performanceEfficiently updating LLMs for new languages

Related Fields

Natural Language ProcessingMachine LearningComputational LinguisticsLow-Resource Computing

Keywords

LLMfine-tuningsparse subnetworkunderrepresented languageslow-resourceparameter-efficientlanguage activationcross-lingualmultilingualLlamaMistral

Academic Context

#Low-Resource NLP#LLM Fine-tuning#Parameter-Efficient Fine-Tuning (PEFT)#Cross-Lingual Transfer#Language Representation

Technology Stack

Frameworks & Libraries

LoRA

Commercial Potential

Potential Products

Multilingual LLM servicesTools for adapting LLMs to specific languagesAI solutions for underrepresented language communities

Target Industries

TechnologyEducationGlobal CommunicationContent Creation

Use Case Examples

Improving translation services for less common languagesDeveloping AI assistants that understand and generate text in diverse languagesEnhancing educational tools for multilingual learners

Competitive Edge

Offers a more efficient and effective approach to improving LLM performance for low-resource languages compared to full fine-tuning or other PEFT methods.

Market Opportunity

Large and growing market for multilingual AI and NLP solutions.

Revenue Models

Licensing of the sparse fine-tuning technologyoffering specialized multilingual LLM services.

Resource Requirements

Compute Needs

Significantly lower compute requirements compared to full fine-tuning, due to fine-tuning only a small subset of parameters.

Data Requirements

Monolingual data in the target underrepresented languages.

Deployment Constraints

Availability of sufficient monolingual data for target languages,Need for accurate identification of language-specific neurons

Scalability

Highly scalable due to parameter efficiency, allowing adaptation to many languages.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years for integration into LLM fine-tuning platforms.

Patent Potential

Moderate, for the sparse subnetwork identification and fine-tuning methodology.

View Full Paper Back to Papers