arxiv_ai 95% Match Research Paper RL Researchers,Robotics Engineers,AI Scientists,Game Developers 2 days ago

Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

reinforcement-learning › offline-rl

📄 Abstract

Abstract: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents with access to RMs can learn more efficiently from fewer samples. However, learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order. In such cases, the amount of information to learn increases exponentially with the number of unordered subtasks. In this work, we address this limitation by introducing three generalisations of RMs: (1) Numeric RMs allow users to express complex tasks in a compact form. (2) In Agenda RMs, states are associated with an agenda that tracks the remaining subtasks to complete. (3) Coupled RMs have coupled states associated with each subtask in the agenda. Furthermore, we introduce a new compositional learning algorithm that leverages coupled RMs: Q-learning with coupled RMs (CoRM). Our experiments show that CoRM scales better than state-of-the-art RM algorithms for long-horizon problems with unordered subtasks.

Authors (5)

Kristina Levina

Nikolaos Pappas

Athanasios Karapantelakis

Aneta Vulgarakis Feljan

Jendrik Seipp

Submitted

October 31, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

This work introduces three generalizations of Reward Machines (RMs) - Numeric, Agenda, and Coupled RMs - to address limitations in learning long-horizon, unordered tasks. It also proposes a new compositional learning algorithm, Q-learning with Coupled RMs (CoRM), demonstrating improved learning efficiency and sample effectiveness.

Business Value

Enables more efficient training of RL agents for complex, multi-step tasks, accelerating development in areas like robotics and autonomous systems.

Paper Metadata

Innovation Type

Algorithmic Extension / New Algorithm

Deployment Feasibility

Moderate. Requires careful design of RMs and integration into RL frameworks.

Limitations Addressed

Addresses the limitation of standard RMs for long-horizon problems with unordered subtasks, where information requirements grow exponentially. It also tackles sample inefficiency in complex RL tasks.

Performance Gains

Experiments show that CoRM significantly improves learning efficiency and sample effectiveness compared to baselines.

Technical Tags

Reward Machines (RMs)Reinforcement LearningLong-Horizon TasksUnordered TasksCompositional LearningNumeric RMsAgenda RMsCoupled RMsQ-learning with Coupled RMs (CoRM)

Research Topics

Efficient RL for Complex TasksHierarchical Reinforcement LearningTask DecompositionReward ShapingNon-Markovian Tasks

Methods & Architectures

Generalization of Reward MachinesCompositional Learning Algorithm (CoRM)Q-learning Numeric RMsAgenda RMsCoupled RMs

Applications & Tasks

Robotics Game Playing Task Planning Autonomous Systems Learning Long-Horizon Unordered TasksExponential Information Growth in RMsSample Inefficiency in RL Reinforcement LearningTask ExecutionPlanning

Related Fields

Reinforcement LearningArtificial IntelligenceRoboticsPlanningControl Theory

Keywords

Reinforcement LearningReward MachinesLong-Horizon TasksUnordered TasksCompositional LearningRL EfficiencyCoRMQ-learningHierarchical RLTask Planning

Academic Context

#Efficient RL for Complex Tasks#Hierarchical Reinforcement Learning#Task Decomposition#Reward Shaping#Non-Markovian Tasks

Commercial Potential

Potential Products

RL training platformsRobotic control systemsAI agents for complex simulations

Target Industries

RoboticsGamingAutomotiveAerospaceLogistics

Use Case Examples

Training a robot to assemble a complex product with multiple unordered steps.Developing an AI agent for a strategy game requiring long-term planning.

Competitive Edge

Extends the capabilities of Reward Machines for a broader class of complex tasks, offering a more structured and efficient approach to RL compared to traditional methods.

Market Opportunity

Growing market for reinforcement learning solutions.

Revenue Models

Licensing of algorithmsintegration into AI development platforms.

Resource Requirements

Compute Needs

Moderate to High, depending on the complexity of the task and RL algorithm.

Data Requirements

Requires environments suitable for long-horizon, unordered tasks.

Deployment Constraints

Complexity of RM design, potential for state-space explosion in Coupled RMs.

Scalability

Scalability is improved compared to standard RMs for unordered tasks, but can still be challenging.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for integration into advanced RL applications.

Patent Potential

Moderate, for the novel RM generalizations and the CoRM algorithm.

View Full Paper Back to Papers