arxiv_cl 90% Match Research Paper NLP Researchers,Computational Linguists,Software Developers (NLP applications) 1 week ago

Flexing in 73 Languages: A Single Small Model for Multilingual Inflection

large-language-models › model-architecture

📄 Abstract

Abstract: We present a compact, single-model approach to multilingual inflection, the task of generating inflected word forms from base lemmas to express grammatical categories. Our model, trained jointly on data from 73 languages, is lightweight, robust to unseen words, and outperforms monolingual baselines in most languages. This demonstrates the effectiveness of multilingual modeling for inflection and highlights its practical benefits: simplifying deployment by eliminating the need to manage and retrain dozens of separate monolingual models. In addition to the standard SIGMORPHON shared task benchmarks, we evaluate our monolingual and multilingual models on 73 Universal Dependencies (UD) treebanks, extracting lemma-tag-form triples and their frequency counts. To ensure realistic data splits, we introduce a novel frequency-weighted, lemma-disjoint train-dev-test resampling procedure. Our work addresses the lack of an open-source, general-purpose, multilingual morphological inflection system capable of handling unseen words across a wide range of languages, including Czech. All code is publicly released at: https://github.com/tomsouri/multilingual-inflection.

Authors (2)

Tomáš Sourada

Jana Straková

Submitted

October 27, 2025

arXiv Category

cs.CL

Text, Speech, and Dialogue. TSD 2025. Lecture Notes in Computer Science, vol 16030. Springer, Cham, pp 39-50

arXiv PDF

Key Contributions

Introduces a single, compact model for multilingual inflection across 73 languages that outperforms monolingual baselines and is robust to unseen words. This simplifies deployment by eliminating the need for numerous separate models.

Business Value

Significantly reduces the operational overhead and complexity of deploying NLP solutions for morphologically rich languages, enabling broader language support with less effort.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High. A single, compact model is easier to deploy and maintain than many separate models.

Limitations Addressed

The complexity and cost of managing and retraining dozens of separate monolingual inflection models for different languages.

Performance Gains

Outperforms monolingual baselines in most languages.

Technical Tags

multilingual inflectionmorphological generationsingle modelcompact modelparameter-efficientunseen wordsUniversal Dependenciesfrequency-weighted resampling

Research Topics

Natural Language ProcessingComputational LinguisticsMachine LearningMultilingual Models

Methods & Architectures

Multilingual modelingjoint trainingfrequency-weighted resamplinglemma-disjoint train-dev-test split Small ModelSingle Model

Applications & Tasks

Natural Language Processing Computational Linguistics Machine Translation Morphological InflectionMultilingual NLP Generating inflected word formsMorphological generation

Datasets & Benchmarks

Datasets

Universal Dependencies (UD) treebanks

SIGMORPHON shared task benchmarkstype accuracy

Related Fields

LinguisticsNLPMachine Learning

Keywords

multilingualinflectionmorphologylarge language modelsLLMcompact modelparameter-efficient73 languagesUniversal Dependenciesunseen wordsdeploymentSIGMORPHONfrequency-weighted

Academic Context

#Natural Language Processing#Computational Linguistics#Machine Learning#Multilingual Models

Commercial Potential

Potential Products

Multilingual NLP librariesTranslation toolsLanguage learning applications

Target Industries

TechnologyPublishingEducationGlobal Communication

Use Case Examples

Generating correct verb conjugations in multiple languagesSupporting diverse language inputs in chatbotsAutomated grammar checking across languages

Competitive Edge

Offers a more efficient and scalable solution for multilingual inflection compared to traditional per-language models.

Market Opportunity

Growing demand for multilingual NLP capabilities.

Revenue Models

Licensing of the model/APIintegration into larger platforms.

Resource Requirements

Compute Needs

Moderate (for training), Low (for inference)

Data Requirements

Morphological data for 73 languages

Deployment Constraints

Model size, computational resources for inference.

Scalability

Scales well due to single model architecture.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Low

View Full Paper Back to Papers