Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper LLM researchers,NLP engineers,AI product developers,Prompt engineers 4 weeks ago

WildIFEval: Instruction Following in the Wild

large-language-models › evaluation
📄 Abstract

Abstract: Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 7K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, extracted from natural user instructions. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. WildIFEval clearly differentiates between small and large models, and demonstrates that all models have a large room for improvement on such tasks. We analyze the effects of the number and type of constraints on performance, revealing interesting patterns of model constraint-following behavior. We release our dataset to promote further research on instruction-following under complex, realistic conditions.

Key Contributions

Introduces WildIFEval, a large-scale dataset of 7K real user instructions with diverse, multi-constraint conditions, spanning a broad lexical and topical spectrum. It benchmarks leading LLMs, revealing significant room for improvement and analyzing the effects of constraint number and type on performance.

Business Value

Enables the development of more capable and reliable AI assistants and applications that can understand and execute complex user requests accurately, improving user experience and task completion rates.