Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Direct Preference Optimization (DPO) is a powerful paradigm for aligning
Large Language Models (LLMs) to human preferences in Machine Translation (MT),
but current methods are hindered by two fundamental challenges: (1) flawed
reward signals from Quality Estimation (QE) models that overlook critical
errors like translation hallucination, and (2) inefficient data utilization
that discards valuable learning signals by selecting only a single win-loss
pair. To address these limitations, we introduce M^2PO: Multi-Pair,
Multi-Perspective Preference Optimization. Our framework integrates a
multi-perspective reward engine that creates a more robust signal by combining
two key viewpoints: a new hallucination penalty for factuality, and an
innovative dynamic quality score that adaptively fuses external evaluations
with the model's own evolving judgment. This is synergistically paired with a
multi-pair construction strategy that systematically creates a comprehensive
set of preference pairs from the entire pool of translation candidates. This
synergistic approach ensures the model learns from a richer spectrum of quality
trade-offs, leading to more robust and faithful translations. On challenging
WMT21-22 benchmarks, M^2PO substantially outperforms existing preference
optimization methods and demonstrates highly competitive performance against
leading proprietary LLMs.
Key Contributions
M^2PO enhances Direct Preference Optimization (DPO) for Machine Translation by addressing flawed reward signals and inefficient data utilization. It introduces a multi-perspective reward engine with a hallucination penalty and dynamic quality score, and a multi-pair construction strategy to create comprehensive preference sets, leading to more robust alignment and improved translation quality.
Business Value
Improves the quality and reliability of machine translation systems, leading to better cross-lingual communication for businesses and individuals, and reducing costs associated with human post-editing.