Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 85% Match Research Paper NLP researchers,ML engineers,Developers of multilingual applications 1 week ago

Quality-Aware Translation Tagging in Multilingual RAG system

large-language-models › model-architecture
📄 Abstract

Abstract: Multilingual Retrieval-Augmented Generation (mRAG) often retrieves English documents and translates them into the query language for low-resource settings. However, poor translation quality degrades response generation performance. Existing approaches either assume sufficient translation quality or utilize the rewriting method, which introduces factual distortion and hallucinations. To mitigate these problems, we propose Quality-Aware Translation Tagging in mRAG (QTT-RAG), which explicitly evaluates translation quality along three dimensions-semantic equivalence, grammatical accuracy, and naturalness&fluency-and attach these scores as metadata without altering the original content. We evaluate QTT-RAG against CrossRAG and DKM-RAG as baselines in two open-domain QA benchmarks (XORQA, MKQA) using six instruction-tuned LLMs ranging from 2.4B to 14B parameters, covering two low-resource languages (Korean and Finnish) and one high-resource language (Chinese). QTT-RAG outperforms the baselines by preserving factual integrity while enabling generator models to make informed decisions based on translation reliability. This approach allows for effective usage of cross-lingual documents in low-resource settings with limited native language documents, offering a practical and robust solution across multilingual domains.
Authors (3)
Hoyeon Moon
Byeolhee Kim
Nikhil Verma
Submitted
October 27, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces Quality-Aware Translation Tagging (QTT-RAG) for multilingual RAG systems. It explicitly evaluates translation quality across semantic equivalence, grammatical accuracy, and naturalness, attaching these scores as metadata without altering the original content, thereby mitigating performance degradation from poor translations.

Business Value

Enables more reliable and accurate multilingual information retrieval and question answering, crucial for global businesses and applications serving diverse language users.