Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 85% Match Research Paper RL Researchers,Robotics Engineers,AI Scientists 1 week ago

DreamerV3-XP: Optimizing exploration through uncertainty estimation

reinforcement-learning › robotics-rl
📄 Abstract

Abstract: We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.
Authors (4)
Lukas Bierling
Davide Pasero
Jan-Henrik Bertrand
Kiki Van Gerwen
Submitted
October 24, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

DreamerV3-XP introduces two novel extensions to the DreamerV3 architecture: a prioritized replay buffer that scores trajectories by return, reconstruction loss, and value error, and an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. These extensions significantly improve exploration and learning efficiency, particularly in sparse-reward settings, leading to faster learning and lower dynamics model loss.

Business Value

Enables more efficient training of RL agents for complex tasks, reducing the need for extensive data collection and potentially accelerating the development of autonomous systems in robotics and other fields.