Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 96% Match Research Paper RL researchers,Robotics engineers,AI safety researchers,Control theorists 1 week ago

LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies

reinforcement-learning › offline-rl
📄 Abstract

Abstract: Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the unconditional mean, at the LRT-gated mean, or a blend, exposing a continuum from exploitation to conservatism. We standardize states and actions consistently at train and test time and report a state-conditional out-of-distribution (OOD) metric alongside return. On D4RL MuJoCo tasks, LRT-Diffusion improves the return-OOD trade-off over strong Q-guided baselines in our implementation while honoring the desired alpha. Theoretically, we establish level-alpha calibration, concise stability bounds, and a return comparison showing when LRT surpasses Q-guidance-especially when off-support errors dominate. Overall, LRT-Diffusion is a drop-in, inference-time method that adds principled, calibrated risk control to diffusion policies for offline RL.
Authors (2)
Ximan Sun
Xiang Cheng
Submitted
October 28, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

LRT-Diffusion introduces a novel risk-aware sampling rule for diffusion policies in offline RL, treating each step as a hypothesis test. This method calibrates guidance based on a user-specified risk level (Type-I error), providing an evidence-driven adjustment with an interpretable risk budget, and naturally composing with Q-gradients.

Business Value

Enables the development of safer and more reliable autonomous systems by providing a principled way to manage risk during policy learning and execution. This is crucial for applications where safety is paramount, such as robotics and autonomous driving.