Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant
(ASGD) lie at the heart of modern large-scale learning, yet their theoretical
properties in high-dimensional settings are rarely understood. In this paper,
we provide rigorous statistical guarantees for constant learning-rate SGD and
ASGD in high-dimensional regimes. Our key innovation is to transfer powerful
tools from high-dimensional time series to online learning. Specifically, by
viewing SGD as a nonlinear autoregressive process and adapting existing
coupling techniques, we prove the geometric-moment contraction of
high-dimensional SGD for constant learning rates, thereby establishing
asymptotic stationarity of the iterates. Building on this, we derive the $q$-th
moment convergence of SGD and ASGD for any $q\ge2$ in general $\ell^s$-norms,
and, in particular, the $\ell^{\infty}$-norm that is frequently adopted in
high-dimensional sparse or structured models. Furthermore, we provide sharp
high-probability concentration analysis which entails the probabilistic bound
of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our
proposed framework offers a novel toolkit for analyzing a broad class of
high-dimensional learning algorithms.
Authors (4)
Jiaqi Li
Zhipeng Lou
Johannes Schmidt-Hieber
Wei Biao Wu
Submitted
October 13, 2025
Key Contributions
Provides rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional settings by adapting tools from time series analysis. It proves geometric-moment contraction and asymptotic stationarity, leading to q-th moment convergence in general l^s norms, including the l-infinity norm.
Business Value
Provides a stronger theoretical foundation for using SGD/ASGD in large-scale machine learning, potentially leading to more robust and predictable training of complex models.