arxiv_cl 95% Match Research paper LLM researchers,Prompt engineers,ML engineers,AI practitioners 3 weeks ago

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

large-language-models › evaluation

📄 Abstract

Abstract: Large language models (LLMs) are highly sensitive to their input prompts, making prompt design a central challenge. While automatic prompt optimization (APO) reduces manual engineering, most approaches assume access to ground-truth references such as labeled validation data. In practice, however, collecting high-quality labels is costly and slow. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization. PDO formulates the problem as a dueling-bandit setting, where supervision signal comes from pairwise preference feedback provided by an LLM judge. The framework combines Double Thompson Sampling (D-TS), which prioritizes informative prompt comparisons, with Top-Performer Guided Mutation, which expands the candidate pool by mutating high-performing prompts. PDO naturally operates in label-free settings and can also incorporate partial labels to mitigate judge noise. Experiments on BIG-bench Hard (BBH) and MS MARCO show that PDO consistently outperforms baseline methods. Ablation studies further demonstrate the effectiveness of both D-TS and prompt mutation.

Authors (9)

Yuanchen Wu

Saurabh Verma

Justin Lee

Fangzhou Xiong

Poppy Zhang

Amel Awadelkarim

+3 more

Submitted

October 14, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces the Prompt Duel Optimizer (PDO), a sample-efficient, label-free framework for optimizing LLM prompts. PDO uses an LLM judge for pairwise preference feedback within a dueling-bandit setting, combining Double Thompson Sampling with mutation strategies to efficiently explore the prompt space and achieve strong performance on benchmarks like BBH and MS MARCO.

Business Value

Enables faster and cheaper optimization of LLM prompts, leading to improved performance and reduced operational costs for applications relying on LLMs.

Paper Metadata

Innovation Type

Label-free prompt optimization framework

Deployment Feasibility

High, as it integrates with existing LLM workflows and requires only an LLM judge.

Limitations Addressed

High cost and time associated with labeling data for prompt optimization,Need for more sample-efficient optimization methods,Limitations of existing APO methods in label-free settings

Performance Gains

PDO achieves strong performance on BBH and MS MARCO using label-free pairwise preferences, demonstrating significant sample efficiency.

Technical Tags

prompt optimizationlabel-free optimizationLLM judgedueling banditsThompson Samplingpairwise preferenceBIG-bench HardMS MARCOsample efficiency

Research Topics

LLM Prompt EngineeringAutomated Prompt OptimizationLabel-Free LearningReinforcement Learning

Methods & Architectures

Prompt Duel Optimizer (PDO)Dueling-bandit settingDouble Thompson Sampling (D-TS)Top-Performer Guided MutationLLM judge for pairwise preferences Large Language Models (LLMs)

Applications & Tasks

LLM Prompt Engineering Natural Language Generation Information Retrieval Cost and slowness of collecting labeled data for prompt optimizationNeed for efficient, label-free prompt optimization methods Optimizing LLM prompts without ground-truth labelsImproving sample efficiency in prompt optimization

Datasets & Benchmarks

Datasets

BIG-bench Hard (BBH), MS MARCO

Benchmarks

PDO performance on BIG-bench Hard (BBH) • PDO performance on MS MARCO

Prompt optimization performanceSample efficiency

Related Fields

Machine LearningNatural Language ProcessingReinforcement LearningOptimization

Keywords

LLMprompt optimizationlabel-freeLLM judgedueling banditsThompson SamplingAPOBIG-bench HardMS MARCOsample efficiencyprompt engineering

Academic Context

#LLM Prompt Engineering#Automated Prompt Optimization#Label-Free Learning#Reinforcement Learning

Commercial Potential

Potential Products

Automated prompt optimization toolsPrompt engineering platforms

Target Industries

TechnologySoftware DevelopmentAI Services

Use Case Examples

Optimizing prompts for chatbotsImproving LLM performance on specific NLP tasks without labeled data

Competitive Edge

Offers a novel label-free approach that is more sample-efficient than many existing prompt optimization techniques.

Market Opportunity

Large and growing market for LLM optimization and prompt engineering tools.

Revenue Models

Licensing of the PDO frameworkconsulting servicesspecialized prompt optimization platforms.

Resource Requirements

Compute Needs

Moderate (for LLM inference and optimization loop)

Data Requirements

Access to LLMs for judging preferences.

Deployment Constraints

Reliance on the quality of the LLM judge, potential for suboptimal prompts if judge is biased.

Scalability

Scalable to different LLMs and prompt spaces.

Regulatory Considerations

None directly.

Production Readiness

Maturity Level

Research/Methodology

Time to Market

1-2 years for integration into MLOps tools.

Licensing

Likely open source for framework/code

Patent Potential

Moderate (for the PDO framework)

View Full Paper Back to Papers