Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper Robotics Researchers,RL Researchers,Control Engineers,AI Developers 2 weeks ago

Closing the Sim2Real Performance Gap in RL

robotics › sim-to-real
📄 Abstract

Abstract: Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.
Authors (5)
Akhil S Anand
Shambhuraj Sawant
Jasper Hoffmann
Dirk Reinhardt
Sebastien Gros
Submitted
October 20, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper proposes a novel framework to directly address the Sim2Real performance gap in Reinforcement Learning by adapting simulator parameters based on real-world performance. It frames this as a bi-level RL problem where an outer-level RL agent tunes the simulation model and reward parameters to maximize real-world policy performance, overcoming the limitation that current methods optimize indirect proxies.

Business Value

Enables more reliable and efficient development of robotic systems and autonomous agents by reducing the need for extensive real-world testing and fine-tuning.