Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Researchers,Game Developers,Robotics Engineers,HCI Researchers 1 week ago

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents

reinforcement-learning › game-playing
📄 Abstract

Abstract: We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned native keyboard-mouse inputs. Unlike API- or GUI-based approaches, this paradigm enables large-scale continual pre-training across heterogeneous domains, including OS, web, and simulation games. Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data. Key techniques include a decaying continual loss to reduce causal confusion and an efficient Sparse-Thinking strategy that balances reasoning depth and inference cost. Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks, is close to the generality of fresh humans in unseen web 3d games, and outperforms GPT-5, Gemini-2.5-Pro, and Claude-4-Sonnet in FPS benchmarks. Scaling results on training-time and test-time confirm that the unified action space sustains improvements when scaled to cross-game and multimodal data. Our results demonstrate that simple, scalable action representations combined with large-scale pre-training provide a promising path toward generalist agents with broad computer-use abilities.
Authors (27)
Zihao Wang
Xujing Li
Yining Ye
Junjie Fang
Haoming Wang
Longxiang Liu
+21 more
Submitted
October 27, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

Game-TARS is a generalist game agent trained with a unified, scalable action space anchored to human-aligned keyboard-mouse inputs, enabling large-scale continual pre-training across OS, web, and game domains. Key innovations include a decaying continual loss for reduced causal confusion and an efficient Sparse-Thinking strategy to balance reasoning depth and inference cost.

Business Value

Paves the way for more versatile AI agents that can automate complex tasks across various digital environments, from gaming to software control.