Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
DreamerV3-XP introduces two novel extensions to the DreamerV3 architecture: a prioritized replay buffer that scores trajectories by return, reconstruction loss, and value error, and an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. These extensions significantly improve exploration and learning efficiency, particularly in sparse-reward settings, leading to faster learning and lower dynamics model loss.
Enables more efficient training of RL agents for complex tasks, reducing the need for extensive data collection and potentially accelerating the development of autonomous systems in robotics and other fields.