Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The exploration problem is one of the main challenges in deep reinforcement
learning (RL). Recent promising works tried to handle the problem with
population-based methods, which collect samples with diverse behaviors derived
from a population of different exploratory policies. Adaptive policy selection
has been adopted for behavior control. However, the behavior selection space is
largely limited by the predefined policy population, which further limits
behavior diversity. In this paper, we propose a general framework called
Learnable Behavioral Control (LBC) to address the limitation, which a) enables
a significantly enlarged behavior selection space via formulating a hybrid
behavior mapping from all policies; b) constructs a unified learnable process
for behavior selection. We introduce LBC into distributed off-policy
actor-critic methods and achieve behavior control via optimizing the selection
of the behavior mappings with bandit-based meta-controllers. Our agents have
achieved 10077.52% mean human normalized score and surpassed 24 human world
records within 1B training frames in the Arcade Learning Environment, which
demonstrates our significant state-of-the-art (SOTA) performance without
degrading the sample efficiency.
Authors (8)
Jiajun Fan
Yuzheng Zhuang
Yuecheng Liu
Jianye Hao
Bin Wang
Jiangcheng Zhu
+2 more
Key Contributions
Introduces Learnable Behavioral Control (LBC), a general framework that significantly enlarges the behavior selection space by formulating a hybrid behavior mapping and constructing a unified learnable process for behavior selection. This framework, integrated with distributed off-policy actor-critic methods and bandit-based meta-controllers, achieves state-of-the-art performance on Atari games.
Business Value
More efficient and effective RL agents can be developed for complex tasks in robotics, game AI, and autonomous systems, reducing training time and improving performance.