Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper RL Researchers,Robotics Engineers,Control Engineers 3 weeks ago

Offline Reinforcement Learning via Inverse Optimization

reinforcement-learning › offline-rl
📄 Abstract

Abstract: Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.
Authors (3)
Ioannis Dimanidis
Tolga Ok
Peyman Mohajerin Esfahani
Submitted
February 27, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper introduces a novel offline Reinforcement Learning algorithm for continuous state-action spaces that leverages Inverse Optimization and a robust Model Predictive Control expert. The key innovation is an exact and tractable convex reformulation for the robust MPC expert, which effectively mitigates distribution shift and achieves competitive performance with significantly fewer parameters.

Business Value

Enables more efficient and robust learning of control policies from pre-collected data, which is crucial for applications where online interaction is expensive or dangerous, such as autonomous driving or industrial robotics.