Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Introduces the Actively Observable Markov Decision Process (AOMDP) where agents can choose to measure the state, which reveals it but incurs delayed negative effects. Proposes an online RL algorithm using belief states and sequential Monte Carlo for approximation, showing provable improvements in sample efficiency and policy value despite measurement costs.
Enables more efficient and effective decision-making in real-world scenarios where observing the state is costly or has side effects, such as in healthcare or industrial control systems.