Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Inspired by the recent successes of Inverse Optimization (IO) across various
application domains, we propose a novel offline Reinforcement Learning (ORL)
algorithm for continuous state and action spaces, leveraging the convex loss
function called ``sub-optimality loss" from the IO literature. To mitigate the
distribution shift commonly observed in ORL problems, we further employ a
robust and non-causal Model Predictive Control (MPC) expert steering a nominal
model of the dynamics using in-hindsight information stemming from the model
mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact
and tractable convex reformulation. In the second part of this study, we show
that the IO hypothesis class, trained by the proposed convex loss function,
enjoys ample expressiveness and achieves competitive performance comparing with
the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo
benchmark while utilizing three orders of magnitude fewer parameters, thereby
requiring significantly fewer computational resources. To facilitate the
reproducibility of our results, we provide an open-source package implementing
the proposed algorithms and the experiments.
Authors (3)
Ioannis Dimanidis
Tolga Ok
Peyman Mohajerin Esfahani
Submitted
February 27, 2025
Key Contributions
This paper introduces a novel offline Reinforcement Learning algorithm for continuous state-action spaces that leverages Inverse Optimization and a robust Model Predictive Control expert. The key innovation is an exact and tractable convex reformulation for the robust MPC expert, which effectively mitigates distribution shift and achieves competitive performance with significantly fewer parameters.
Business Value
Enables more efficient and robust learning of control policies from pre-collected data, which is crucial for applications where online interaction is expensive or dangerous, such as autonomous driving or industrial robotics.