Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 85% Match Research Paper Reinforcement Learning Researchers,Robotics Engineers,Control Systems Engineers 20 hours ago

Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization

reinforcement-learning › robotics-rl
📄 Abstract

Abstract: Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i)~preserving the plasticity of the critic network and (ii) directed exploration for rapid adaptation to changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both. The evidential design ensures a fast and accurate approximation of the uncertainty around the state value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by changes in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm \emph{Evidential Proximal Policy Optimization (EPPO)} due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.

Key Contributions

Proposes Evidential Proximal Policy Optimization (EPPO) to overcome non-stationary dynamics in continuous control tasks. EPPO utilizes an evidential critic to accurately estimate uncertainty around state values, preserving critic plasticity and enabling directed exploration for rapid adaptation to changing dynamics, thus improving stability and performance.

Business Value

Enables the development of more robust and adaptable robotic and autonomous systems that can operate reliably in unpredictable and changing real-world environments.