arxiv_ml 85% Match Research Paper RL researchers,Control engineers,Robotics engineers,AI researchers 2 weeks ago

Policy Transfer Ensures Fast Learning for Continuous-Time LQR with Entropy Regularization

reinforcement-learning › offline-rl

📄 Abstract

Abstract: Reinforcement Learning (RL) enables agents to learn optimal decision-making strategies through interaction with an environment, yet training from scratch on complex tasks can be highly inefficient. Transfer learning (TL), widely successful in large language models (LLMs), offers a promising direction for enhancing RL efficiency by leveraging pre-trained models. This paper investigates policy transfer, a TL approach that initializes learning in a target RL task using a policy from a related source task, in the context of continuous-time linear quadratic regulators (LQRs) with entropy regularization. We provide the first theoretical proof of policy transfer for continuous-time RL, proving that a policy optimal for one LQR serves as a near-optimal initialization for closely related LQRs, while preserving the original algorithm's convergence rate. Furthermore, we introduce a novel policy learning algorithm for continuous-time LQRs that achieves global linear and local super-linear convergence. Our results demonstrate both theoretical guarantees and algorithmic benefits of transfer learning in continuous-time RL, addressing a gap in existing literature and extending prior work from discrete to continuous time settings. As a byproduct of our analysis, we derive the stability of a class of continuous-time score-based diffusion models via their connection with LQRs.

Authors (2)

Xin Guo

Zijiu Lyu

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper provides the first theoretical proof for policy transfer in continuous-time reinforcement learning, specifically for Linear Quadratic Regulators (LQRs) with entropy regularization. It proves that a policy optimal for one LQR serves as a near-optimal initialization for related LQRs, preserving the original algorithm's convergence rate, and introduces a novel learning algorithm with global linear and local super-linear convergence.

Business Value

Enables faster and more reliable training of control systems and autonomous agents, reducing development time and improving performance in real-world applications.

Paper Metadata

Innovation Type

Theoretical

Deployment Feasibility

High for systems that can be modeled as continuous-time LQRs. Requires careful implementation of the learning algorithm.

Limitations Addressed

The inefficiency of training RL agents from scratch on complex tasks and the lack of theoretical guarantees for policy transfer in continuous-time RL settings.

Performance Gains

Preserves original algorithm's convergence rate,Global linear convergence,Local super-linear convergence

Technical Tags

reinforcement learning (RL)transfer learning (TL)policy transfercontinuous-timelinear quadratic regulators (LQR)entropy regularizationtheoretical proofconvergence ratenear-optimal initializationglobal linear convergence

Research Topics

Reinforcement LearningTransfer LearningControl TheoryMachine Learning TheoryOptimization

Methods & Architectures

Policy TransferEntropy RegularizationContinuous-time RL algorithmsTheoretical Analysis

Applications & Tasks

Robotics Control Systems Autonomous Systems Improving RL efficiencyAccelerating learning in new tasksStabilizing control systems Learning optimal policies for continuous-time LQRsLeveraging pre-trained policiesEnsuring fast and stable learning

Related Fields

Control TheoryOptimal ControlMachine LearningRoboticsAutonomous Systems

Keywords

Reinforcement LearningTransfer LearningPolicy TransferContinuous-time RLLinear Quadratic RegulatorEntropy RegularizationConvergenceControl SystemsOptimizationMachine Learning Theory

Academic Context

#Reinforcement Learning#Transfer Learning#Control Theory#Machine Learning Theory#Optimization

Commercial Potential

Potential Products

RL training frameworks with transfer learning capabilitiesControl system design tools

Target Industries

RoboticsAerospaceAutomotiveManufacturingEnergy

Use Case Examples

Accelerating robot arm control learningOptimizing drone flight controlImproving autonomous vehicle navigation

Competitive Edge

Provides theoretical foundations and algorithmic improvements for policy transfer in continuous-time RL, addressing a key challenge in RL efficiency.

Market Opportunity

Large market for advanced control systems and autonomous agents.

Revenue Models

Licensing of algorithmsintegration into specialized control software.

Resource Requirements

Compute Needs

Moderate, depending on the complexity of the LQR system and the length of training.

Data Requirements

Interaction data from the environment or simulated environments for training.

Deployment Constraints

Requires a well-defined system dynamics model (LQR) and the ability to interact with the environment or a simulator.

Scalability

The theoretical results apply to continuous-time systems, and the algorithm's convergence properties suggest good scalability.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for robust implementation in control systems.

View Full Paper Back to Papers