Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: In this work, we propose three efficient restart paradigms for model-free
non-stationary reinforcement learning (RL). We identify two core issues with
the restart design of Mao et al. (2022)'s RestartQ-UCB algorithm: (1) complete
forgetting, where all the information learned about an environment is lost
after a restart, and (2) scheduled restarts, in which restarts occur only at
predefined timings, regardless of the incompatibility of the policy with the
current environment dynamics. We introduce three approaches, which we call
partial, adaptive, and selective restarts to modify the algorithms RestartQ-UCB
and RANDOMIZEDQ (Wang et al., 2025). We find near-optimal empirical performance
in multiple different environments, decreasing dynamic regret by up to $91$%
relative to RestartQ-UCB.
Authors (5)
Hiroshi Nonaka
Simon Ambrozak
Sofia R. Miskala-Dinc
Amedeo Ercole
Aviva Prins
Submitted
October 13, 2025
Key Contributions
This paper proposes three novel, efficient restart paradigms (partial, adaptive, selective) for model-free reinforcement learning in non-stationary environments. These methods address issues of complete forgetting and scheduled restarts in existing algorithms, achieving significant reductions in dynamic regret.
Business Value
Enables more robust and efficient learning for autonomous systems operating in dynamic environments, such as robots or self-driving cars, leading to improved performance and reliability.