Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 92% Match Research Paper RL Researchers,Game AI Developers,Robotics Engineers 1 week ago

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

reinforcement-learning › game-playing
📄 Abstract

Abstract: The exploration problem is one of the main challenges in deep reinforcement learning (RL). Recent promising works tried to handle the problem with population-based methods, which collect samples with diverse behaviors derived from a population of different exploratory policies. Adaptive policy selection has been adopted for behavior control. However, the behavior selection space is largely limited by the predefined policy population, which further limits behavior diversity. In this paper, we propose a general framework called Learnable Behavioral Control (LBC) to address the limitation, which a) enables a significantly enlarged behavior selection space via formulating a hybrid behavior mapping from all policies; b) constructs a unified learnable process for behavior selection. We introduce LBC into distributed off-policy actor-critic methods and achieve behavior control via optimizing the selection of the behavior mappings with bandit-based meta-controllers. Our agents have achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames in the Arcade Learning Environment, which demonstrates our significant state-of-the-art (SOTA) performance without degrading the sample efficiency.
Authors (8)
Jiajun Fan
Yuzheng Zhuang
Yuecheng Liu
Jianye Hao
Bin Wang
Jiangcheng Zhu
+2 more
Submitted
May 9, 2023
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces Learnable Behavioral Control (LBC), a general framework that significantly enlarges the behavior selection space by formulating a hybrid behavior mapping and constructing a unified learnable process for behavior selection. This framework, integrated with distributed off-policy actor-critic methods and bandit-based meta-controllers, achieves state-of-the-art performance on Atari games.

Business Value

More efficient and effective RL agents can be developed for complex tasks in robotics, game AI, and autonomous systems, reducing training time and improving performance.