Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Establishes a precise boundary for the computational hardness of reinforcement learning with transition lookahead. It proves that one-step lookahead is polynomial-time solvable via linear programming, while two or more steps become NP-hard, highlighting the significant computational cost of advanced predictive information.
Provides crucial theoretical insights for designing practical RL systems, guiding developers on the feasibility of incorporating lookahead mechanisms based on computational constraints.