Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 90% Match Research Paper Researchers in reinforcement learning,Control theorists,Robotics engineers,Researchers in diffusion models 2 weeks ago

Continuous Q-Score Matching: Diffusion Guided Reinforcement Learning for Continuous-Time Control

reinforcement-learning › robotics-rl
📄 Abstract

Abstract: Reinforcement learning (RL) has achieved significant success across a wide range of domains, however, most existing methods are formulated in discrete time. In this work, we introduce a novel RL method for continuous-time control, where stochastic differential equations govern state-action dynamics. Departing from traditional value function-based approaches, our key contribution is the characterization of continuous-time Q-functions via a martingale condition and the linking of diffusion policy scores to the action gradient of a learned continuous Q-function by the dynamic programming principle. This insight motivates Continuous Q-Score Matching (CQSM), a score-based policy improvement algorithm. Notably, our method addresses a long-standing challenge in continuous-time RL: preserving the action-evaluation capability of Q-functions without relying on time discretization. We further provide theoretical closed-form solutions for linear-quadratic (LQ) control problems within our framework. Numerical results in simulated environments demonstrate the effectiveness of our proposed method and compare it to popular baselines.
Authors (3)
Chengxiu Hua
Jiawen Gu
Yushun Tang
Submitted
October 20, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces Continuous Q-Score Matching (CQSM), a novel reinforcement learning method for continuous-time control problems governed by SDEs. It uniquely characterizes continuous-time Q-functions and links diffusion policy scores to action gradients, enabling policy improvement without time discretization.

Business Value

Enables more precise and efficient control of dynamic systems in real-world applications like robotics, autonomous vehicles, and financial modeling, where continuous-time dynamics are crucial.