Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 90% Match Research Paper AI Researchers,Robotics Engineers,Machine Learning Engineers,Computer Vision Researchers 1 week ago

Co-Evolving Latent Action World Models

generative-ai › gans
📄 Abstract

Abstract: Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pre-trained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.
Authors (6)
Yucen Wang
Fengming Zhang
De-Chuan Zhan
Li Zhao
Kaixin Wang
Jiang Bian
Submitted
October 30, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

CoLA-World successfully enables the joint training of Latent Action Models (LAM) and World Models, overcoming the challenges of representational collapse and redundant training in two-stage approaches. It introduces a critical warm-up phase to align representations, allowing the world model to act as a tutor for the LAM, fostering a co-evolution cycle.

Business Value

Development of more realistic and controllable simulation environments for training AI agents (e.g., robots, autonomous vehicles), reducing the need for expensive real-world data collection and testing.