Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This work introduces three generalizations of Reward Machines (RMs) - Numeric, Agenda, and Coupled RMs - to address limitations in learning long-horizon, unordered tasks. It also proposes a new compositional learning algorithm, Q-learning with Coupled RMs (CoRM), demonstrating improved learning efficiency and sample effectiveness.
Enables more efficient training of RL agents for complex, multi-step tasks, accelerating development in areas like robotics and autonomous systems.