arxiv_cl 95% Match Research Paper AI researchers,LLM developers,AI safety engineers,Developers of autonomous agents 2 weeks ago

Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety

ai-safety › alignment

📄 Abstract

Abstract: As Large Language Model (LLM) agents increasingly operate in complex environments with real-world consequences, their safety becomes critical. While uncertainty quantification is well-studied for single-turn tasks, multi-turn agentic scenarios with real-world tool access present unique challenges where uncertainties and ambiguities compound, leading to severe or catastrophic risks beyond traditional text generation failures. We propose using "quitting" as a simple yet effective behavioral mechanism for LLM agents to recognize and withdraw from situations where they lack confidence. Leveraging the ToolEmu framework, we conduct a systematic evaluation of quitting behavior across 12 state-of-the-art LLMs. Our results demonstrate a highly favorable safety-helpfulness trade-off: agents prompted to quit with explicit instructions improve safety by an average of +0.39 on a 0-3 scale across all models (+0.64 for proprietary models), while maintaining a negligible average decrease of -0.03 in helpfulness. Our analysis demonstrates that simply adding explicit quit instructions proves to be a highly effective safety mechanism that can immediately be deployed in existing agent systems, and establishes quitting as an effective first-line defense mechanism for autonomous agents in high-stakes applications.

Authors (4)

Vamshi Krishna Bonagiri

Ponnurangam Kumaragurum

Khanh Nguyen

Benjamin Plaut

Submitted

October 18, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper introduces 'quitting' as a novel and effective behavioral mechanism for LLM agents to enhance safety by recognizing and withdrawing from situations where confidence is low. The research systematically evaluates this mechanism across various LLMs using the ToolEmu framework, demonstrating a significant improvement in safety with a negligible impact on helpfulness, particularly for proprietary models.

Business Value

Enhances the reliability and trustworthiness of AI agents deployed in critical applications, reducing the risk of costly or dangerous failures. This can lead to wider adoption of LLM agents in sensitive industries.

Paper Metadata

Innovation Type

Algorithmic/Methodological

Deployment Feasibility

High, as it involves modifying prompts and agent behavior rather than fundamental architectural changes.

Limitations Addressed

Addresses the critical safety concerns of LLM agents operating in complex, real-world environments with tool access, where traditional uncertainty quantification methods are insufficient due to compounding uncertainties in multi-turn interactions.

Performance Gains

+0.39 average safety improvement (0-3 scale), +0.64 for proprietary models

Technical Tags

LLM agentssafetyuncertainty quantificationmulti-turn interactiontool usebehavioral mechanismquittingToolEmu frameworksafety-helpfulness trade-off

Research Topics

AI SafetyLLM Agent BehaviorReliability in AIDecision Making under Uncertainty

Methods & Architectures

PromptingBehavioral mechanism designSystematic evaluation Large Language Models (LLMs)

Applications & Tasks

Real-world environments AI agent deployment Safety risks in LLM agentsCompounding uncertainties in multi-turn tasksCatastrophic risks beyond text generation Safe operation of LLM agentsRecognizing and withdrawing from risky situations

Related Fields

AI EthicsHuman-Computer InteractionRoboticsAutonomous Systems

Keywords

LLM agentsAI safetyuncertaintymulti-turntool usequittingbehavioral mechanismrisk mitigationagent reliabilityToolEmuprompt engineeringdecision making

Academic Context

#AI Safety#LLM Agent Behavior#Reliability in AI#Decision Making under Uncertainty

Technology Stack

Frameworks & Libraries

ToolEmu

Commercial Potential

Potential Products

Safer LLM agent frameworksRisk assessment tools for AI agents

Target Industries

FinanceHealthcareAutonomous DrivingCustomer Service

Use Case Examples

An LLM agent navigating a complex financial systemAn LLM agent assisting in medical diagnosisAn LLM agent controlling robotic systems

Competitive Edge

Offers a novel, simple, and effective safety mechanism that complements existing uncertainty quantification techniques, focusing on agent behavior rather than just model output.

Market Opportunity

Growing market for reliable and safe AI agents.

Revenue Models

Indirectly through enabling safer deployment of AI services and products.

Resource Requirements

Compute Needs

Standard compute for LLM inference and evaluation.

Data Requirements

Requires environments for multi-turn interactions with tool access, potentially simulated or real-world.

Deployment Constraints

Requires careful prompt design and integration into the agent's decision-making loop.

Scalability

The 'quitting' mechanism is conceptually scalable as it relies on LLM's inherent capabilities and prompt engineering.

Regulatory Considerations

Potential for future regulatory frameworks around AI agent safety and decision-making.

Production Readiness

Maturity Level

Research/Experimental

Time to Market

Short for integration into existing agent frameworks, longer for standardized adoption.

Patent Potential

Low, as it's a behavioral mechanism and prompt strategy.

View Full Paper Back to Papers