Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Reinforcement Learning (RL) enables agents to learn optimal decision-making
strategies through interaction with an environment, yet training from scratch
on complex tasks can be highly inefficient. Transfer learning (TL), widely
successful in large language models (LLMs), offers a promising direction for
enhancing RL efficiency by leveraging pre-trained models.
This paper investigates policy transfer, a TL approach that initializes
learning in a target RL task using a policy from a related source task, in the
context of continuous-time linear quadratic regulators (LQRs) with entropy
regularization. We provide the first theoretical proof of policy transfer for
continuous-time RL, proving that a policy optimal for one LQR serves as a
near-optimal initialization for closely related LQRs, while preserving the
original algorithm's convergence rate. Furthermore, we introduce a novel policy
learning algorithm for continuous-time LQRs that achieves global linear and
local super-linear convergence. Our results demonstrate both theoretical
guarantees and algorithmic benefits of transfer learning in continuous-time RL,
addressing a gap in existing literature and extending prior work from discrete
to continuous time settings.
As a byproduct of our analysis, we derive the stability of a class of
continuous-time score-based diffusion models via their connection with LQRs.
Submitted
October 16, 2025
Key Contributions
This paper provides the first theoretical proof for policy transfer in continuous-time reinforcement learning, specifically for Linear Quadratic Regulators (LQRs) with entropy regularization. It proves that a policy optimal for one LQR serves as a near-optimal initialization for related LQRs, preserving the original algorithm's convergence rate, and introduces a novel learning algorithm with global linear and local super-linear convergence.
Business Value
Enables faster and more reliable training of control systems and autonomous agents, reducing development time and improving performance in real-world applications.