Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Human evaluation of machine translation is in an arms race with translation
model quality: as our models get better, our evaluation methods need to be
improved to ensure that quality gains are not lost in evaluation noise. To this
end, we experiment with a two-stage version of the current state-of-the-art
translation evaluation paradigm (MQM), which we call MQM re-annotation. In this
setup, an MQM annotator reviews and edits a set of pre-existing MQM
annotations, that may have come from themselves, another human annotator, or an
automatic MQM annotation system. We demonstrate that rater behavior in
re-annotation aligns with our goals, and that re-annotation results in
higher-quality annotations, mostly due to finding errors that were missed
during the first pass.
Authors (6)
Parker Riley
Daniel Deutsch
Mara Finkelstein
Colten DiIanni
Juraj Juraska
Markus Freitag
Submitted
October 28, 2025
Key Contributions
This paper proposes and evaluates MQM re-annotation, a two-stage human evaluation process for machine translation where annotators review and edit existing MQM annotations. The study demonstrates that this re-annotation process leads to higher-quality annotations by identifying errors missed in the initial pass, thereby reducing evaluation noise and providing a more reliable measure of translation quality.
Business Value
More reliable and accurate evaluation of machine translation systems is crucial for businesses relying on MT for localization, content creation, and customer support, ensuring that quality improvements are accurately measured and that systems meet business needs.