Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research paper LLM researchers,Prompt engineers,ML engineers,AI practitioners 3 weeks ago

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

large-language-models › evaluation
📄 Abstract

Abstract: Large language models (LLMs) are highly sensitive to their input prompts, making prompt design a central challenge. While automatic prompt optimization (APO) reduces manual engineering, most approaches assume access to ground-truth references such as labeled validation data. In practice, however, collecting high-quality labels is costly and slow. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization. PDO formulates the problem as a dueling-bandit setting, where supervision signal comes from pairwise preference feedback provided by an LLM judge. The framework combines Double Thompson Sampling (D-TS), which prioritizes informative prompt comparisons, with Top-Performer Guided Mutation, which expands the candidate pool by mutating high-performing prompts. PDO naturally operates in label-free settings and can also incorporate partial labels to mitigate judge noise. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently outperforms baseline methods. Ablation studies further demonstrate the effectiveness of both D-TS and prompt mutation.
Authors (9)
Yuanchen Wu
Saurabh Verma
Justin Lee
Fangzhou Xiong
Poppy Zhang
Amel Awadelkarim
+3 more
Submitted
October 14, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces the Prompt Duel Optimizer (PDO), a sample-efficient, label-free framework for optimizing LLM prompts. PDO uses an LLM judge for pairwise preference feedback within a dueling-bandit setting, combining Double Thompson Sampling with mutation strategies to efficiently explore the prompt space and achieve strong performance on benchmarks like BBH and MS MARCO.

Business Value

Enables faster and cheaper optimization of LLM prompts, leading to improved performance and reduced operational costs for applications relying on LLMs.