Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper introduces a principled two-stage framework for leveraging offline data to accelerate online reinforcement learning. It proposes learning data-driven value envelopes (upper and lower bounds) from offline data and incorporating them into online algorithms, offering a more flexible and tighter approximation than fixed shaping functions.
Enables faster and more efficient training of RL agents, reducing the need for extensive real-world interaction and potentially lowering development costs for autonomous systems.