arxiv_cl 90% Match Research Paper Machine translation researchers,Computational linguists,NLP engineers,Translators 3 weeks ago

Semantic Prosody in Machine Translation: the English-Chinese Case of Passive Structures

large-language-models › alignment

📄 Abstract

Abstract: Semantic prosody is a collocational meaning formed through the co-occurrence of a linguistic unit and a consistent series of collocates, which should be treated separately from semantic meaning. Since words that are literal translations of each other may have different semantic prosody, more attention should be paid to this linguistic property to generate accurate translations. However, current machine translation models cannot handle this problem. To bridge the gap, we propose an approach to teach machine translation models about semantic prosody of a specific structure. We focus on Chinese BEI passives and create a dataset of English-Chinese sentence pairs with the purpose of demonstrating the negative semantic prosody of BEI passives. Then we fine-tune OPUS-MT, NLLB-600M and mBART50 models with our dataset for the English-Chinese translation task. Our results show that fine-tuned MT models perform better on using BEI passives for translating unfavourable content and avoid using it for neutral and favourable content. Also, in NLLB-600M, which is a multilingual model, this knowledge of semantic prosody can be transferred from English-Chinese translation to other language pairs, such as Spanish-Chinese.

Authors (4)

Xinyue Ma

Pol Pastells

Mireia Farrús

Mariona Taulé

Submitted

October 16, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Addresses the challenge of semantic prosody in machine translation, specifically for English-Chinese passive structures, by creating a specialized dataset and fine-tuning existing MT models. The fine-tuned models demonstrate improved performance in using BEI passives for translating unfavorable content.

Business Value

Enhances the quality and nuance of machine translations, particularly for sensitive content or specific linguistic structures, leading to more reliable cross-cultural communication and better understanding in business and diplomacy.

Paper Metadata

Innovation Type

Dataset and Fine-tuning Methodology

Deployment Feasibility

Moderate. Requires fine-tuning existing large translation models, which can be computationally intensive. The specialized dataset is key.

Limitations Addressed

Current machine translation models' inability to handle semantic prosody, which is crucial for accurate translation beyond literal meaning, particularly with structures like passive voice.

Performance Gains

Fine-tuned MT models perform better on using BEI passives for translating unfavorable content

Technical Tags

Semantic prosodyMachine translationEnglish-Chinese translationPassive structuresBEI passivesCollocational meaningFine-tuningOPUS-MTNLLB-600MmBART50

Research Topics

Machine TranslationLinguisticsCross-lingual NLPSemantic AnalysisChinese Language Processing

Methods & Architectures

Creation of a dataset of English-Chinese sentence pairsFine-tuning of MT models (OPUS-MT, NLLB-600M, mBART50)Focus on Chinese BEI passives OPUS-MTNLLB-600MmBART50

Applications & Tasks

Translation services Cross-cultural communication Linguistic research Current MT models cannot handle semantic prosodyTranslating passive structures accuratelyCapturing nuanced collocational meanings Improving machine translation accuracyTranslating English passive structures into Chinese BEI passivesGenerating translations with appropriate semantic prosody

Datasets & Benchmarks

Datasets

Dataset of English-Chinese sentence pairs demonstrating negative semantic prosody of BEI passives

Translation qualityAppropriate use of BEI passivesHandling of unfavorable content

Related Fields

Machine TranslationComputational LinguisticsLinguisticsNatural Language ProcessingChinese Studies

Keywords

machine translationsemantic prosodyEnglish-Chinesepassive voiceBEI passivefine-tuningNLLBmBARTOPUS-MTlinguisticscollocationNLP

Academic Context

#Machine Translation#Linguistics#Cross-lingual NLP#Semantic Analysis#Chinese Language Processing

Commercial Potential

Potential Products

Improved English-Chinese translation modelsSpecialized translation modules for passive structuresTools for analyzing semantic prosody in text

Target Industries

TechnologyPublishingInternational BusinessDiplomacy

Use Case Examples

Translating legal documents where passive voice is commonEnsuring nuanced and culturally appropriate translations of sensitive news articlesImproving cross-lingual communication tools

Competitive Edge

Focuses on the specific linguistic phenomenon of semantic prosody in translation, offering a targeted improvement over general-purpose MT models by incorporating specialized knowledge and fine-tuning.

Market Opportunity

Large global market for machine translation services.

Revenue Models

Licensing of fine-tuned modelsAPI access for specialized translation.

Resource Requirements

Compute Needs

High for fine-tuning large MT models.

Data Requirements

Requires a carefully curated dataset of English-Chinese sentence pairs focusing on passive structures and semantic prosody.

Deployment Constraints

Fine-tuning requires significant computational resources; the improvement might be specific to the English-Chinese pair and passive structures.

Scalability

The fine-tuning approach can be applied to other language pairs and linguistic phenomena.

Production Readiness

Maturity Level

Research

Time to Market

2-3 years for robust integration into translation services.

View Full Paper Back to Papers