arxiv_ai 92% Match Research Paper RL Researchers,Game AI Developers,Robotics Engineers 1 week ago

Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection

reinforcement-learning › game-playing

📄 Abstract

Abstract: The exploration problem is one of the main challenges in deep reinforcement learning (RL). Recent promising works tried to handle the problem with population-based methods, which collect samples with diverse behaviors derived from a population of different exploratory policies. Adaptive policy selection has been adopted for behavior control. However, the behavior selection space is largely limited by the predefined policy population, which further limits behavior diversity. In this paper, we propose a general framework called Learnable Behavioral Control (LBC) to address the limitation, which a) enables a significantly enlarged behavior selection space via formulating a hybrid behavior mapping from all policies; b) constructs a unified learnable process for behavior selection. We introduce LBC into distributed off-policy actor-critic methods and achieve behavior control via optimizing the selection of the behavior mappings with bandit-based meta-controllers. Our agents have achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames in the Arcade Learning Environment, which demonstrates our significant state-of-the-art (SOTA) performance without degrading the sample efficiency.

Authors (8)

Jiajun Fan

Yuzheng Zhuang

Yuecheng Liu

Jianye Hao

Bin Wang

Jiangcheng Zhu

+2 more

Submitted

May 9, 2023

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces Learnable Behavioral Control (LBC), a general framework that significantly enlarges the behavior selection space by formulating a hybrid behavior mapping and constructing a unified learnable process for behavior selection. This framework, integrated with distributed off-policy actor-critic methods and bandit-based meta-controllers, achieves state-of-the-art performance on Atari games.

Business Value

More efficient and effective RL agents can be developed for complex tasks in robotics, game AI, and autonomous systems, reducing training time and improving performance.

Paper Metadata

Innovation Type

Algorithmic Framework

Deployment Feasibility

Moderate. Primarily demonstrated in simulation (Atari), but the principles are applicable to real-world RL problems.

Limitations Addressed

Limited behavior selection space in population-based RL methods,Inefficient exploration strategies,Poor sample efficiency

Performance Gains

Achieved state-of-the-art performance on Atari games, breaking human world records, with improved sample efficiency.

Technical Tags

Reinforcement Learning (RL)Exploration ProblemPopulation-Based MethodsBehavior ControlLearnable Behavioral Control (LBC)Sample EfficiencyOff-Policy Actor-CriticBandit-based Meta-ControllersAtari Games

Research Topics

Deep Reinforcement LearningExploration StrategiesPolicy OptimizationMulti-agent RLGame AI

Methods & Architectures

Learnable Behavioral Control (LBC)Population-based methodsDistributed off-policy actor-criticBandit-based meta-controllersHybrid behavior mapping Actor-Critic ArchitecturesMeta-controllers

Applications & Tasks

Video Games Robotics Simulation Environments Exploration in RLBehavior diversitySample efficiencyPolicy selection Breaking Atari human world recordsImproving sample efficiency in RLEnhancing exploration in complex environments

Datasets & Benchmarks

Benchmarks

Atari Human World Records

Game scoresSample efficiency

Related Fields

Reinforcement LearningDeep LearningArtificial IntelligenceGame Development

Keywords

reinforcement learningexplorationbehavior controlsample efficiencyAtariactor-criticmeta-controllerpopulation-baseddeep RLgame playing

Academic Context

#Deep Reinforcement Learning#Exploration Strategies#Policy Optimization#Multi-agent RL#Game AI

Commercial Potential

Potential Products

Advanced game AI agentsRobotic control systemsAutonomous simulation agents

Target Industries

GamingRoboticsSimulationAutonomous Systems

Use Case Examples

Developing AI players for complex video gamesTraining robots for intricate manipulation tasksOptimizing exploration strategies in simulated environments

Competitive Edge

Addresses the exploration challenge in RL with a novel 'Learnable Behavioral Control' framework that expands the action space and optimizes behavior selection, outperforming existing population-based methods.

Market Opportunity

Significant market in gaming AI and robotics.

Revenue Models

Licensing of AI agentsdevelopment of game AI solutions.

Resource Requirements

Compute Needs

High (distributed training, multiple policies)

Data Requirements

Requires access to environments for interaction (e.g., Atari emulator).

Deployment Constraints

Real-world deployment requires careful consideration of safety and generalization.

Scalability

The framework is designed for distributed settings, suggesting good scalability.

Production Readiness

Maturity Level

Research

Time to Market

Medium

Patent Potential

Moderate (novel framework)

View Full Paper Back to Papers