Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper MT researchers,NLP evaluation specialists,Linguists,Developers of translation tools 1 week ago

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation

large-language-models › evaluation
📄 Abstract

Abstract: Human evaluation of machine translation is in an arms race with translation model quality: as our models get better, our evaluation methods need to be improved to ensure that quality gains are not lost in evaluation noise. To this end, we experiment with a two-stage version of the current state-of-the-art translation evaluation paradigm (MQM), which we call MQM re-annotation. In this setup, an MQM annotator reviews and edits a set of pre-existing MQM annotations, that may have come from themselves, another human annotator, or an automatic MQM annotation system. We demonstrate that rater behavior in re-annotation aligns with our goals, and that re-annotation results in higher-quality annotations, mostly due to finding errors that were missed during the first pass.
Authors (6)
Parker Riley
Daniel Deutsch
Mara Finkelstein
Colten DiIanni
Juraj Juraska
Markus Freitag
Submitted
October 28, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper proposes and evaluates MQM re-annotation, a two-stage human evaluation process for machine translation where annotators review and edit existing MQM annotations. The study demonstrates that this re-annotation process leads to higher-quality annotations by identifying errors missed in the initial pass, thereby reducing evaluation noise and providing a more reliable measure of translation quality.

Business Value

More reliable and accurate evaluation of machine translation systems is crucial for businesses relying on MT for localization, content creation, and customer support, ensuring that quality improvements are accurately measured and that systems meet business needs.