Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper AI researchers,LLM developers,AI safety engineers,Developers of autonomous agents 2 weeks ago

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

ai-safety › alignment
📄 Abstract

Abstract: As Large Language Model (LLM) agents increasingly operate in complex environments with real-world consequences, their safety becomes critical. While uncertainty quantification is well-studied for single-turn tasks, multi-turn agentic scenarios with real-world tool access present unique challenges where uncertainties and ambiguities compound, leading to severe or catastrophic risks beyond traditional text generation failures. We propose using "quitting" as a simple yet effective behavioral mechanism for LLM agents to recognize and withdraw from situations where they lack confidence. Leveraging the ToolEmu framework, we conduct a systematic evaluation of quitting behavior across 12 state-of-the-art LLMs. Our results demonstrate a highly favorable safety-helpfulness trade-off: agents prompted to quit with explicit instructions improve safety by an average of +0.39 on a 0-3 scale across all models (+0.64 for proprietary models), while maintaining a negligible average decrease of -0.03 in helpfulness. Our analysis demonstrates that simply adding explicit quit instructions proves to be a highly effective safety mechanism that can immediately be deployed in existing agent systems, and establishes quitting as an effective first-line defense mechanism for autonomous agents in high-stakes applications.
Authors (4)
Vamshi Krishna Bonagiri
Ponnurangam Kumaragurum
Khanh Nguyen
Benjamin Plaut
Submitted
October 18, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper introduces 'quitting' as a novel and effective behavioral mechanism for LLM agents to enhance safety by recognizing and withdrawing from situations where confidence is low. The research systematically evaluates this mechanism across various LLMs using the ToolEmu framework, demonstrating a significant improvement in safety with a negligible impact on helpfulness, particularly for proprietary models.

Business Value

Enhances the reliability and trustworthiness of AI agents deployed in critical applications, reducing the risk of costly or dangerous failures. This can lead to wider adoption of LLM agents in sensitive industries.