Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large language models (LLMs) are highly sensitive to their input prompts,
making prompt design a central challenge. While automatic prompt optimization
(APO) reduces manual engineering, most approaches assume access to ground-truth
references such as labeled validation data. In practice, however, collecting
high-quality labels is costly and slow. We propose the Prompt Duel Optimizer
(PDO), a sample-efficient framework for label-free prompt optimization. PDO
formulates the problem as a dueling-bandit setting, where supervision signal
comes from pairwise preference feedback provided by an LLM judge. The framework
combines Double Thompson Sampling (D-TS), which prioritizes informative prompt
comparisons, with Top-Performer Guided Mutation, which expands the candidate
pool by mutating high-performing prompts. PDO naturally operates in label-free
settings and can also incorporate partial labels to mitigate judge noise.
Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently
outperforms baseline methods. Ablation studies further demonstrate the
effectiveness of both D-TS and prompt mutation.
Authors (9)
Yuanchen Wu
Saurabh Verma
Justin Lee
Fangzhou Xiong
Poppy Zhang
Amel Awadelkarim
+3 more
Submitted
October 14, 2025
Key Contributions
Introduces the Prompt Duel Optimizer (PDO), a sample-efficient, label-free framework for optimizing LLM prompts. PDO uses an LLM judge for pairwise preference feedback within a dueling-bandit setting, combining Double Thompson Sampling with mutation strategies to efficiently explore the prompt space and achieve strong performance on benchmarks like BBH and MS MARCO.
Business Value
Enables faster and cheaper optimization of LLM prompts, leading to improved performance and reduced operational costs for applications relying on LLMs.