Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 90% Match Research Paper AI Researchers,Machine Learning Engineers,Legal Tech Developers,Information Retrieval Specialists 1 week ago

Reinforcement Learning for Long-Horizon Multi-Turn Search Agents

reinforcement-learning › multi-agent
📄 Abstract

Abstract: Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.
Authors (2)
Vivek Kalyan
Martin Andrews
Submitted
October 28, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This work demonstrates that Reinforcement Learning significantly enhances the capabilities of LLM agents for long-horizon, multi-turn tasks, outperforming prompt-based approaches. The RL-trained agent achieved higher accuracy on a legal document search benchmark, highlighting the benefits of learning from experience.

Business Value

Enables more sophisticated and efficient AI agents for complex information retrieval and task completion, particularly in specialized domains like legal research.