Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
π Abstract
Abstract: Sparse-reward reinforcement learning (RL) can model a wide range of highly
complex tasks. Solving sparse-reward tasks is RL's core premise, requiring
efficient exploration coupled with long-horizon credit assignment, and
overcoming these challenges is key for building self-improving agents with
superhuman ability. Prior work commonly explores with the objective of solving
many sparse-reward tasks, making exploration of individual high-dimensional,
long-horizon tasks intractable. We argue that solving such challenging tasks
requires solving simpler tasks that are relevant to the target task, i.e.,
whose achieval will teach the agent skills required for solving the target
task. We demonstrate that this sense of direction, necessary for effective
exploration, can be extracted from existing RL algorithms, without leveraging
any prior information. To this end, we propose a method for directed
sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects
exploratory goals in the direction of the target task. We connect DISCOVER to
principled exploration in bandits, formally bounding the time until the target
task becomes achievable in terms of the agent's initial distance to the target,
but independent of the volume of the space of all tasks. We then perform a
thorough evaluation in high-dimensional environments. We find that the directed
goal selection of DISCOVER solves exploration problems that are beyond the
reach of prior state-of-the-art exploration methods in RL.
Authors (4)
Leander Diaz-Bone
Marco Bagatella
Jonas HΓΌbotter
Andreas Krause
Key Contributions
This paper proposes DISCOVER, a method for directed sparse-reward RL that addresses the intractability of exploring individual high-dimensional, long-horizon tasks. It argues that solving simpler, relevant sub-tasks is key and introduces a way to extract this 'sense of direction' from existing RL algorithms without prior information, enabling more effective exploration and skill acquisition.
Business Value
Enables the development of more capable AI agents that can learn complex tasks with limited explicit reward signals, leading to advancements in autonomous systems, robotics, and sophisticated game AI.