Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper proposes a novel approach for Offline Safe Reinforcement Learning (OSRL) by framing it as a minimax objective solved via online optimization. It proves approximate optimality and offers a practical approximation that bypasses the need for offline policy evaluation, demonstrating reliable safety constraint enforcement and high rewards on the DSRL benchmark.
Enables the development of safer autonomous systems and decision-making agents by learning from existing data without requiring online interaction, crucial for high-stakes applications like autonomous driving or medical treatment planning.