Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper RL Researchers,Robotics Engineers,AI Safety Researchers,Data Scientists 1 week ago

Online Optimization for Offline Safe Reinforcement Learning

reinforcement-learning › offline-rl
📄 Abstract

Abstract: We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical approximation that can be combined with any offline RL algorithm, eliminating the need for offline policy evaluation. Empirical results on the DSRL benchmark demonstrate that our method reliably enforces safety constraints under stringent cost budgets, while achieving high rewards. The code is available at https://github.com/yassineCh/O3SRL.
Authors (5)
Yassine Chemingui
Aryan Deshwal
Alan Fern
Thanh Nguyen-Tang
Janardhan Rao Doppa
Submitted
October 24, 2025
arXiv Category
cs.LG
arXiv PDF Code

Key Contributions

This paper proposes a novel approach for Offline Safe Reinforcement Learning (OSRL) by framing it as a minimax objective solved via online optimization. It proves approximate optimality and offers a practical approximation that bypasses the need for offline policy evaluation, demonstrating reliable safety constraint enforcement and high rewards on the DSRL benchmark.

Business Value

Enables the development of safer autonomous systems and decision-making agents by learning from existing data without requiring online interaction, crucial for high-stakes applications like autonomous driving or medical treatment planning.

View Code on GitHub