Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 80% Match Algorithmic Research Paper Reinforcement Learning Researchers,Robotics Engineers,Operations Research Analysts,AI Scientists 2 weeks ago

Rank-One Modified Value Iteration

reinforcement-learning › robotics-rl
📄 Abstract

Abstract: In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.
Authors (4)
Arman Sharifi Kolarijani
Tolga Ok
Peyman Mohajerin Esfahani
Mohamad Amin Sharif Kolarijani
Submitted
May 3, 2025
arXiv Category
math.OC
arXiv PDF

Key Contributions

Introduces Rank-One Modified Value Iteration, a novel algorithm for MDPs that uses a rank-one approximation of the transition matrix in policy evaluation. It achieves the same convergence rate and complexity as value iteration/Q-learning but consistently outperforms first-order methods in numerical simulations for both planning and learning.

Business Value

Enables faster and more effective training of AI agents for tasks like robotics control, game playing, and resource management, leading to improved decision-making in complex environments.