Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper RL Researchers,Robotics Engineers,AI Scientists,Game Developers 2 days ago

Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

reinforcement-learning › offline-rl
📄 Abstract

Abstract: Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents with access to RMs can learn more efficiently from fewer samples. However, learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order. In such cases, the amount of information to learn increases exponentially with the number of unordered subtasks. In this work, we address this limitation by introducing three generalisations of RMs: (1) Numeric RMs allow users to express complex tasks in a compact form. (2) In Agenda RMs, states are associated with an agenda that tracks the remaining subtasks to complete. (3) Coupled RMs have coupled states associated with each subtask in the agenda. Furthermore, we introduce a new compositional learning algorithm that leverages coupled RMs: Q-learning with coupled RMs (CoRM). Our experiments show that CoRM scales better than state-of-the-art RM algorithms for long-horizon problems with unordered subtasks.
Authors (5)
Kristina Levina
Nikolaos Pappas
Athanasios Karapantelakis
Aneta Vulgarakis Feljan
Jendrik Seipp
Submitted
October 31, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

This work introduces three generalizations of Reward Machines (RMs) - Numeric, Agenda, and Coupled RMs - to address limitations in learning long-horizon, unordered tasks. It also proposes a new compositional learning algorithm, Q-learning with Coupled RMs (CoRM), demonstrating improved learning efficiency and sample effectiveness.

Business Value

Enables more efficient training of RL agents for complex, multi-step tasks, accelerating development in areas like robotics and autonomous systems.