Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper introduces $\beta$-DQN, a novel and efficient exploration method for Deep Q-Learning that augments standard DQN with a behavior function ($eta$) to estimate action probabilities. This allows $\beta$-DQN to generate diverse policies balancing exploration and bias correction, adaptively selecting the best policy per episode via a meta-controller, and achieving superior performance with minimal computational overhead.
Enables more efficient training of RL agents for complex tasks like robotics control or game playing, reducing development time and improving performance.