Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 85% Match Research Paper RL Researchers,AI Engineers,Robotics Researchers 2 weeks ago

Learn to Change the World: Multi-level Reinforcement Learning with Model-Changing Actions

reinforcement-learning › multi-agent
📄 Abstract

Abstract: Reinforcement learning usually assumes a given or sometimes even fixed environment in which an agent seeks an optimal policy to maximize its long-term discounted reward. In contrast, we consider agents that are not limited to passive adaptations: they instead have model-changing actions that actively modify the RL model of world dynamics itself. Reconfiguring the underlying transition processes can potentially increase the agents' rewards. Motivated by this setting, we introduce the multi-layer configurable time-varying Markov decision process (MCTVMDP). In an MCTVMDP, the lower-level MDP has a non-stationary transition function that is configurable through upper-level model-changing actions. The agent's objective consists of two parts: Optimize the configuration policies in the upper-level MDP and optimize the primitive action policies in the lower-level MDP to jointly improve its expected long-term reward.
Authors (4)
Ziqing Lu
Babak Hassibi
Lifeng Lai
Weiyu Xu
Submitted
October 16, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces the Multi-layer Configurable Time-Varying Markov Decision Process (MCTVMDP) to model environments where agents can actively change the underlying dynamics. This allows agents to not only adapt to but also proactively modify their environment to maximize rewards, offering a new paradigm for intelligent agents in dynamic settings.

Business Value

Enables more sophisticated autonomous systems that can adapt to and even shape their operational environments, leading to improved efficiency and robustness in complex industrial or robotic applications.