arxiv_ai 85% Match Research Paper RL researchers,Robotics engineers,AI game developers 2 weeks ago

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

reinforcement-learning › multi-agent

📄 Abstract

Abstract: Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise, requiring efficient exploration coupled with long-horizon credit assignment, and overcoming these challenges is key for building self-improving agents with superhuman ability. Prior work commonly explores with the objective of solving many sparse-reward tasks, making exploration of individual high-dimensional, long-horizon tasks intractable. We argue that solving such challenging tasks requires solving simpler tasks that are relevant to the target task, i.e., whose achieval will teach the agent skills required for solving the target task. We demonstrate that this sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without leveraging any prior information. To this end, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent's initial distance to the target, but independent of the volume of the space of all tasks. We then perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.

Authors (4)

Leander Diaz-Bone

Marco Bagatella

Jonas Hübotter

Andreas Krause

Submitted

May 26, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper proposes DISCOVER, a method for directed sparse-reward RL that addresses the intractability of exploring individual high-dimensional, long-horizon tasks. It argues that solving simpler, relevant sub-tasks is key and introduces a way to extract this 'sense of direction' from existing RL algorithms without prior information, enabling more effective exploration and skill acquisition.

Business Value

Enables the development of more capable AI agents that can learn complex tasks with limited explicit reward signals, leading to advancements in autonomous systems, robotics, and sophisticated game AI.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Moderate. Requires significant computational resources for training RL agents, but the algorithmic approach itself is deployable once trained.

Limitations Addressed

Intractability of exploration in individual high-dimensional, long-horizon sparse-reward tasks,Lack of efficient exploration strategies in standard RL,Difficulty in credit assignment over long horizons

Technical Tags

sparse-reward RLexplorationcredit assignmentgoal-conditioned RLcurriculum learninglong-horizon tasksskill acquisitionself-improving agents

Research Topics

Efficient Exploration in RLCurriculum Learning for Complex TasksLong-Horizon Reinforcement LearningSkill Discovery and Transfer

Methods & Architectures

DISCOVER (Directed Sparse-reward goal-conditioned very long-horizon RL)Goal-conditioned reinforcement learningCurriculum learning Reinforcement Learning Agents

Applications & Tasks

Robotics Game Playing Complex task automation Solving sparse-reward tasksEfficient exploration in high-dimensional state spacesLong-horizon credit assignment Learning complex tasks with sparse rewardsDeveloping self-improving agentsEnabling efficient exploration

Related Fields

Reinforcement LearningMachine LearningArtificial IntelligenceRoboticsGame AI

Keywords

reinforcement learningsparse rewardsexplorationcurriculum learninglong horizoncredit assignmentgoal-conditionedskill learningself-improvementagentroboticsgames

Academic Context

#Efficient Exploration in RL#Curriculum Learning for Complex Tasks#Long-Horizon Reinforcement Learning#Skill Discovery and Transfer

Commercial Potential

Potential Products

Advanced RL training platformsRobotics control softwareAI game agents

Target Industries

RoboticsGamingAutonomous SystemsLogistics

Use Case Examples

Training a robot arm to perform a complex assembly task with only a final success reward.Developing AI agents for strategy games with delayed rewards.

Competitive Edge

Provides a more directed and efficient exploration strategy compared to random or naive exploration methods in sparse-reward settings.

Market Opportunity

Growing market for advanced AI agents in simulation and real-world applications.

Revenue Models

Licensing of algorithmsdevelopment of specialized AI agentsconsulting services.

Resource Requirements

Compute Needs

High, typical for training complex RL agents over long horizons.

Data Requirements

Requires environments that can generate sparse rewards and allow for goal-conditioned learning.

Deployment Constraints

Training time and computational cost can be significant. Transferability of learned skills to slightly different environments needs validation.

Scalability

Scales with the complexity of the task and the state-action space. The curriculum approach aids scalability.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for practical applications in robotics or complex games.

Patent Potential

Low to moderate, for specific algorithmic components.

View Full Paper Back to Papers