Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Multilingual Retrieval-Augmented Generation (mRAG) often retrieves English
documents and translates them into the query language for low-resource
settings. However, poor translation quality degrades response generation
performance. Existing approaches either assume sufficient translation quality
or utilize the rewriting method, which introduces factual distortion and
hallucinations. To mitigate these problems, we propose Quality-Aware
Translation Tagging in mRAG (QTT-RAG), which explicitly evaluates translation
quality along three dimensions-semantic equivalence, grammatical accuracy, and
naturalness&fluency-and attach these scores as metadata without altering the
original content. We evaluate QTT-RAG against CrossRAG and DKM-RAG as baselines
in two open-domain QA benchmarks (XORQA, MKQA) using six instruction-tuned LLMs
ranging from 2.4B to 14B parameters, covering two low-resource languages
(Korean and Finnish) and one high-resource language (Chinese). QTT-RAG
outperforms the baselines by preserving factual integrity while enabling
generator models to make informed decisions based on translation reliability.
This approach allows for effective usage of cross-lingual documents in
low-resource settings with limited native language documents, offering a
practical and robust solution across multilingual domains.
Authors (3)
Hoyeon Moon
Byeolhee Kim
Nikhil Verma
Submitted
October 27, 2025
Key Contributions
Introduces Quality-Aware Translation Tagging (QTT-RAG) for multilingual RAG systems. It explicitly evaluates translation quality across semantic equivalence, grammatical accuracy, and naturalness, attaching these scores as metadata without altering the original content, thereby mitigating performance degradation from poor translations.
Business Value
Enables more reliable and accurate multilingual information retrieval and question answering, crucial for global businesses and applications serving diverse language users.