Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: In many practical sequential decision-making problems, tracking the state of
the environment incurs a sensing/communication/computation cost. In these
settings, the agent's interaction with its environment includes the additional
component of deciding when to sense the state, in a manner that balances the
value associated with optimal (state-specific) actions and the cost of sensing.
We formulate this as an expected discounted cost Markov Decision Process (MDP),
wherein the agent incurs an additional cost for sensing its next state, but has
the option to take actions while remaining `blind' to the system state. We pose
this problem as a classical discounted cost MDP with an expanded (countably
infinite) state space. While computing the optimal policy for this MDP is
intractable in general, we derive lower bounds on the optimal value function,
which allow us to bound the suboptimality gap of any policy. We also propose a
computationally efficient algorithm SPI, based on policy improvement, which in
practice performs close to the optimal policy. Finally, we benchmark against
the state-of-the-art via a numerical case study.
Authors (2)
Vansh Kapoor
Jayakrishnan Nair
Key Contributions
JSON parse error: Unexpected token i in JSON at position 53128