Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper explores using diffusion models to represent the successor state measure (SSM) for policies in offline reinforcement learning and imitation learning. By enforcing Bellman flow constraints, they derive a simple Bellman update on the diffusion step distribution, integrating generative power with RL principles.
Could lead to more robust and capable AI agents in domains like robotics and autonomous systems by improving how policies are learned and represented, especially from offline data.