Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 80% Match Research Paper AI Researchers,Reinforcement Learning Researchers,ML Engineers,Robotics Engineers 2 weeks ago

A Unified Framework for Zero-Shot Reinforcement Learning

reinforcement-learning › rlhf
📄 Abstract

Abstract: Zero-shot reinforcement learning (RL) has emerged as a setting for developing general agents in an unsupervised manner, capable of solving downstream tasks without additional training or planning at test-time. Unlike conventional RL, which optimizes policies for a fixed reward, zero-shot RL requires agents to encode representations rich enough to support immediate adaptation to any objective, drawing parallels to vision and language foundation models. Despite growing interest, the field lacks a common analytical lens. We present the first unified framework for zero-shot RL. Our formulation introduces a consistent notation and taxonomy that organizes existing approaches and allows direct comparison between them. Central to our framework is the classification of algorithms into two families: direct representations, which learn end-to-end mappings from rewards to policies, and compositional representations, which decompose the representation leveraging the substructure of the value function. Within this framework, we highlight shared principles and key differences across methods, and we derive an extended bound for successor-feature methods, offering a new perspective on their performance in the zero-shot regime. By consolidating existing work under a common lens, our framework provides a principled foundation for future research in zero-shot RL and outlines a clear path toward developing more general agents.
Authors (4)
Jacopo Di Ventura
Jan Felix Kleuker
Aske Plaat
Thomas Moerland
Submitted
October 23, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper presents the first unified framework for zero-shot Reinforcement Learning (RL), providing a consistent notation and taxonomy to organize and compare existing approaches. It classifies algorithms into two families: direct representations (end-to-end reward-to-policy mappings) and compositional representations (decomposing representations based on substructure). This framework facilitates research by offering a common analytical lens for developing general agents that can solve downstream tasks without additional training.

Business Value

Accelerates the development of more versatile and adaptable AI agents, reducing the need for task-specific training. This leads to more efficient deployment of AI in dynamic environments and a broader range of applications.