arxiv_ml 80% Match Algorithmic Research Paper Reinforcement Learning Researchers,Robotics Engineers,Operations Research Analysts,AI Scientists 2 weeks ago

Rank-One Modified Value Iteration

reinforcement-learning › robotics-rl

📄 Abstract

Abstract: In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.

Authors (4)

Arman Sharifi Kolarijani

Tolga Ok

Peyman Mohajerin Esfahani

Mohamad Amin Sharif Kolarijani

Submitted

May 3, 2025

arXiv Category

math.OC

arXiv PDF

Key Contributions

Introduces Rank-One Modified Value Iteration, a novel algorithm for MDPs that uses a rank-one approximation of the transition matrix in policy evaluation. It achieves the same convergence rate and complexity as value iteration/Q-learning but consistently outperforms first-order methods in numerical simulations for both planning and learning.

Business Value

Enables faster and more effective training of AI agents for tasks like robotics control, game playing, and resource management, leading to improved decision-making in complex environments.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High, as it's an algorithmic improvement for existing RL frameworks.

Limitations Addressed

Addresses the computational efficiency and performance limitations of existing first-order algorithms for planning and learning in Markov Decision Processes.

Performance Gains

Consistently outperforms first-order algorithms.

Technical Tags

Markov decision processespolicy iterationrank-one approximationvalue iterationQ-learningstationary distributionpower methodplanninglearningconvergence rate

Research Topics

Reinforcement LearningMarkov Decision ProcessesPlanning AlgorithmsMachine Learning TheoryOptimization

Methods & Architectures

Rank-One Modified Value Iteration algorithmPolicy iteration-type updateApproximation of stationary distribution using power method

Applications & Tasks

Reinforcement Learning Robotics Operations Research Game Theory Computational cost of traditional policy evaluationNeed for efficient algorithms for planning and learning in MDPs Solving planning problems in MDPsSolving learning problems in MDPsPolicy evaluationValue function approximation

Datasets & Benchmarks

Benchmarks

Consistently outperforms first-order algorithms and their accelerated versions in numerical simulations.

Convergence rateComputational complexityPerformance in planning and learning tasks

Related Fields

Reinforcement LearningOperations ResearchMachine LearningControl TheoryOptimization

Keywords

Markov decision processpolicy iterationvalue iterationQ-learningrank-one approximationstationary distributionpower methodreinforcement learningplanninglearningalgorithm

Academic Context

#Reinforcement Learning#Markov Decision Processes#Planning Algorithms#Machine Learning Theory#Optimization

Commercial Potential

Potential Products

Optimized RL training librariesDecision-making engines for autonomous systems

Target Industries

RoboticsGamingLogisticsFinanceAutonomous Systems

Use Case Examples

Training robots for complex manipulation tasksDeveloping AI players for strategic gamesOptimizing supply chain logistics

Competitive Edge

Offers a novel algorithm that achieves comparable theoretical complexity to value iteration but demonstrates superior empirical performance.

Resource Requirements

Compute Needs

Standard computational resources for running simulations and training RL agents.

Data Requirements

Requires defining MDP environments for planning and learning tasks.

Deployment Constraints

Requires careful implementation and tuning for specific MDP problems.

Scalability

The algorithm's efficiency is a key aspect, suggesting good scalability.

View Full Paper Back to Papers