Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Zero-shot reinforcement learning (RL) has emerged as a setting for developing
general agents in an unsupervised manner, capable of solving downstream tasks
without additional training or planning at test-time. Unlike conventional RL,
which optimizes policies for a fixed reward, zero-shot RL requires agents to
encode representations rich enough to support immediate adaptation to any
objective, drawing parallels to vision and language foundation models. Despite
growing interest, the field lacks a common analytical lens.
We present the first unified framework for zero-shot RL. Our formulation
introduces a consistent notation and taxonomy that organizes existing
approaches and allows direct comparison between them. Central to our framework
is the classification of algorithms into two families: direct representations,
which learn end-to-end mappings from rewards to policies, and compositional
representations, which decompose the representation leveraging the substructure
of the value function. Within this framework, we highlight shared principles
and key differences across methods, and we derive an extended bound for
successor-feature methods, offering a new perspective on their performance in
the zero-shot regime. By consolidating existing work under a common lens, our
framework provides a principled foundation for future research in zero-shot RL
and outlines a clear path toward developing more general agents.
Authors (4)
Jacopo Di Ventura
Jan Felix Kleuker
Aske Plaat
Thomas Moerland
Submitted
October 23, 2025
Key Contributions
This paper presents the first unified framework for zero-shot Reinforcement Learning (RL), providing a consistent notation and taxonomy to organize and compare existing approaches. It classifies algorithms into two families: direct representations (end-to-end reward-to-policy mappings) and compositional representations (decomposing representations based on substructure). This framework facilitates research by offering a common analytical lens for developing general agents that can solve downstream tasks without additional training.
Business Value
Accelerates the development of more versatile and adaptable AI agents, reducing the need for task-specific training. This leads to more efficient deployment of AI in dynamic environments and a broader range of applications.