Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large-scale networked systems, such as traffic, power, and wireless grids,
challenge reinforcement-learning agents with both scale and environment shifts.
To address these challenges, we propose GSAC (Generalizable and Scalable
Actor-Critic), a framework that couples causal representation learning with
meta actor-critic learning to achieve both scalability and domain
generalization. Each agent first learns a sparse local causal mask that
provably identifies the minimal neighborhood variables influencing its
dynamics, yielding exponentially tight approximately compact representations
(ACRs) of state and domain factors. These ACRs bound the error of truncating
value functions to $\kappa$-hop neighborhoods, enabling efficient learning on
graphs. A meta actor-critic then trains a shared policy across multiple source
domains while conditioning on the compact domain factors; at test time, a few
trajectories suffice to estimate the new domain factor and deploy the adapted
policy. We establish finite-sample guarantees on causal recovery, actor-critic
convergence, and adaptation gap, and show that GSAC adapts rapidly and
significantly outperforms learning-from-scratch and conventional adaptation
baselines.
Authors (5)
Hao Liang
Shuqing Shi
Yudi Zhang
Biwei Huang
Yali Du
Submitted
October 24, 2025
Key Contributions
JSON parse error: Unexpected token { in JSON at position 46343