Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Reinforcement learning (RL) has achieved significant success across a wide
range of domains, however, most existing methods are formulated in discrete
time. In this work, we introduce a novel RL method for continuous-time control,
where stochastic differential equations govern state-action dynamics. Departing
from traditional value function-based approaches, our key contribution is the
characterization of continuous-time Q-functions via a martingale condition and
the linking of diffusion policy scores to the action gradient of a learned
continuous Q-function by the dynamic programming principle. This insight
motivates Continuous Q-Score Matching (CQSM), a score-based policy improvement
algorithm. Notably, our method addresses a long-standing challenge in
continuous-time RL: preserving the action-evaluation capability of Q-functions
without relying on time discretization. We further provide theoretical
closed-form solutions for linear-quadratic (LQ) control problems within our
framework. Numerical results in simulated environments demonstrate the
effectiveness of our proposed method and compare it to popular baselines.
Authors (3)
Chengxiu Hua
Jiawen Gu
Yushun Tang
Submitted
October 20, 2025
Key Contributions
Introduces Continuous Q-Score Matching (CQSM), a novel reinforcement learning method for continuous-time control problems governed by SDEs. It uniquely characterizes continuous-time Q-functions and links diffusion policy scores to action gradients, enabling policy improvement without time discretization.
Business Value
Enables more precise and efficient control of dynamic systems in real-world applications like robotics, autonomous vehicles, and financial modeling, where continuous-time dynamics are crucial.