arxiv_ml 95% Match Research Paper AI safety researchers,Reinforcement learning engineers,Robotics developers,Researchers in formal methods 2 weeks ago

Provably Optimal Reinforcement Learning under Safety Filtering

ai-safety › alignment

📄 Abstract

Abstract: Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.

Authors (4)

Donggeon David Oh

Duy P. Nguyen

Haimin Hu

Jaime F. Fisac

Submitted

October 20, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Proves that enforcing safety with a sufficiently permissive safety filter in RL does not degrade asymptotic performance. Formalizes RL safety using SC-MDP and defines a filtered MDP, demonstrating that the perceived safety-performance tradeoff is not inherent.

Business Value

Enables the safe deployment of RL agents in safety-critical applications like autonomous vehicles, industrial automation, and healthcare, increasing trust and adoption of AI systems.

Paper Metadata

Innovation Type

Theoretical

Deployment Feasibility

High, as it provides theoretical foundations for designing safer RL systems, guiding practical implementation.

Limitations Addressed

Lack of formal safety guarantees in RL,The perceived tradeoff between safety and performance in RL,Difficulty in preventing catastrophic failures during training and deployment

Performance Gains

Proves that asymptotic performance is not degraded by safety filtering, challenging the common perception.

Technical Tags

reinforcement learningsafety guaranteessafety filteroptimal controlMarkov decision processcatastrophic failureperformance tradeoffSC-MDPfiltered MDPasymptotic performance

Research Topics

AI SafetyReinforcement LearningFormal VerificationControl TheoryDecision Making

Methods & Architectures

Formalization of RL safety with SC-MDPDefinition of filtered MDPTheoretical proof of non-degradation of asymptotic performanceSafety filtering

Applications & Tasks

Robotics Autonomous Systems Safety-Critical AI Reinforcement Learning Lack of formal safety guarantees in RLPerceived performance sacrifice due to safety filtersEnsuring categorical avoidance of failure statesBalancing safety and performance Safe reinforcement learningOptimal control under safety constraintsPolicy optimization

Related Fields

AI SafetyReinforcement LearningControl TheoryFormal MethodsRobotics

Keywords

Reinforcement LearningAI SafetySafety FilterOptimal ControlMarkov Decision ProcessCatastrophic FailurePerformanceFormal MethodsAutonomous SystemsAlignment

Academic Context

#AI Safety#Reinforcement Learning#Formal Verification#Control Theory#Decision Making

Commercial Potential

Potential Products

RL libraries with built-in safety guaranteesFrameworks for developing safe autonomous systems

Target Industries

AutomotiveAerospaceRoboticsHealthcareIndustrial Automation

Use Case Examples

Developing self-driving car systems that avoid accidentsCreating industrial robots that operate safely around humansDesigning AI agents for hazardous environments

Competitive Edge

Provides a theoretical breakthrough that resolves a key perceived limitation of RL, enabling its application in domains where safety is paramount.

Market Opportunity

The market for safe and reliable AI systems is enormous and growing across all industries.

Revenue Models

Licensing of safety-certified RL algorithms and frameworks.

Resource Requirements

Compute Needs

Primarily theoretical analysis; computational requirements depend on the specific RL algorithms and safety filters implemented.

Data Requirements

Theoretical analysis applies to any Markov Decision Process.

Deployment Constraints

The effectiveness of the safety filter is crucial.,Formal verification of the SC-MDP and filtered MDP properties.

Scalability

The theoretical results apply regardless of the scale of the MDP, but practical implementation of safety filters may face scalability challenges.

Production Readiness

Maturity Level

Theoretical Foundation

Time to Market

Immediate impact on research; medium-term for practical implementation in safety-critical systems.

View Full Paper Back to Papers