Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 85% Match Research Paper RL researchers,Control engineers,Robotics engineers,AI researchers 2 weeks ago

Policy Transfer Ensures Fast Learning for Continuous-Time LQR with Entropy Regularization

reinforcement-learning › offline-rl
📄 Abstract

Abstract: Reinforcement Learning (RL) enables agents to learn optimal decision-making strategies through interaction with an environment, yet training from scratch on complex tasks can be highly inefficient. Transfer learning (TL), widely successful in large language models (LLMs), offers a promising direction for enhancing RL efficiency by leveraging pre-trained models. This paper investigates policy transfer, a TL approach that initializes learning in a target RL task using a policy from a related source task, in the context of continuous-time linear quadratic regulators (LQRs) with entropy regularization. We provide the first theoretical proof of policy transfer for continuous-time RL, proving that a policy optimal for one LQR serves as a near-optimal initialization for closely related LQRs, while preserving the original algorithm's convergence rate. Furthermore, we introduce a novel policy learning algorithm for continuous-time LQRs that achieves global linear and local super-linear convergence. Our results demonstrate both theoretical guarantees and algorithmic benefits of transfer learning in continuous-time RL, addressing a gap in existing literature and extending prior work from discrete to continuous time settings. As a byproduct of our analysis, we derive the stability of a class of continuous-time score-based diffusion models via their connection with LQRs.
Authors (2)
Xin Guo
Zijiu Lyu
Submitted
October 16, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper provides the first theoretical proof for policy transfer in continuous-time reinforcement learning, specifically for Linear Quadratic Regulators (LQRs) with entropy regularization. It proves that a policy optimal for one LQR serves as a near-optimal initialization for related LQRs, preserving the original algorithm's convergence rate, and introduces a novel learning algorithm with global linear and local super-linear convergence.

Business Value

Enables faster and more reliable training of control systems and autonomous agents, reducing development time and improving performance in real-world applications.