Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Robust reinforcement learning (Robust RL) seeks to handle epistemic
uncertainty in environment dynamics, but existing approaches often rely on
nested min--max optimization, which is computationally expensive and yields
overly conservative policies. We propose \textbf{Adaptive Rank Representation
(AdaRL)}, a bi-level optimization framework that improves robustness by
aligning policy complexity with the intrinsic dimension of the task. At the
lower level, AdaRL performs policy optimization under fixed-rank constraints
with dynamics sampled from a Wasserstein ball around a centroid model. At the
upper level, it adaptively adjusts the rank to balance the bias--variance
trade-off, projecting policy parameters onto a low-rank manifold. This design
avoids solving adversarial worst-case dynamics while ensuring robustness
without over-parameterization. Empirical results on MuJoCo continuous control
benchmarks demonstrate that AdaRL not only consistently outperforms fixed-rank
baselines (e.g., SAC) and state-of-the-art robust RL methods (e.g., RNAC,
Parseval), but also converges toward the intrinsic rank of the underlying
tasks. These results highlight that adaptive low-rank policy representations
provide an efficient and principled alternative for robust RL under model
uncertainty.
Authors (7)
Chenliang Li
Junyu Leng
Jiaxiang Li
Youbang Sun
Shixiang Chen
Shahin Shahrampour
+1 more
Submitted
October 13, 2025
Key Contributions
Proposes AdaRL, a bi-level optimization framework for robust RL that aligns policy complexity with task dimensionality using adaptive low-rank constraints. This approach avoids computationally expensive nested min-max optimization and overly conservative policies by balancing bias-variance trade-offs.
Business Value
Enables the development of more reliable and adaptable robotic systems and autonomous agents that can perform tasks effectively under uncertain conditions, reducing the need for extensive re-training or manual intervention.