arxiv_ml 70% Match Research Paper Researchers in optimization and machine learning theory,Machine learning engineers working with large datasets 3 weeks ago

Statistical Guarantees for High-Dimensional Stochastic Gradient Descent

large-language-models › training-methods

📄 Abstract

Abstract: Stochastic Gradient Descent (SGD) and its Ruppert-Polyak averaged variant (ASGD) lie at the heart of modern large-scale learning, yet their theoretical properties in high-dimensional settings are rarely understood. In this paper, we provide rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional regimes. Our key innovation is to transfer powerful tools from high-dimensional time series to online learning. Specifically, by viewing SGD as a nonlinear autoregressive process and adapting existing coupling techniques, we prove the geometric-moment contraction of high-dimensional SGD for constant learning rates, thereby establishing asymptotic stationarity of the iterates. Building on this, we derive the $q$-th moment convergence of SGD and ASGD for any $q\ge2$ in general $\ell^s$-norms, and, in particular, the $\ell^{\infty}$-norm that is frequently adopted in high-dimensional sparse or structured models. Furthermore, we provide sharp high-probability concentration analysis which entails the probabilistic bound of high-dimensional ASGD. Beyond closing a critical gap in SGD theory, our proposed framework offers a novel toolkit for analyzing a broad class of high-dimensional learning algorithms.

Authors (4)

Jiaqi Li

Zhipeng Lou

Johannes Schmidt-Hieber

Wei Biao Wu

Submitted

October 13, 2025

arXiv Category

stat.ML

arXiv PDF

Key Contributions

Provides rigorous statistical guarantees for constant learning-rate SGD and ASGD in high-dimensional settings by adapting tools from time series analysis. It proves geometric-moment contraction and asymptotic stationarity, leading to q-th moment convergence in general l^s norms, including the l-infinity norm.

Business Value

Provides a stronger theoretical foundation for using SGD/ASGD in large-scale machine learning, potentially leading to more robust and predictable training of complex models.

Paper Metadata

Innovation Type

Theoretical Analysis

Deployment Feasibility

This is a theoretical analysis of an existing algorithm (SGD), so its feasibility is tied to the deployment of SGD itself, which is widespread.

Limitations Addressed

The theoretical understanding of SGD and ASGD in high-dimensional regimes, especially with constant learning rates, has been limited. This work provides a rigorous foundation for their behavior in these settings.

Performance Gains

Establishes theoretical convergence properties (stationarity, moment convergence) for SGD/ASGD in high dimensions, which were previously lacking.

Technical Tags

Stochastic Gradient Descent (SGD)high-dimensional statisticsonline learninggeometric moment contractionasymptotic stationarityq-th moment convergencel-infinity normASGDconstant learning rate

Research Topics

Optimization AlgorithmsHigh-Dimensional StatisticsMachine Learning TheoryOnline LearningConvergence Analysis

Methods & Architectures

Stochastic Gradient Descent (SGD)Averaged SGD (ASGD)Coupling TechniquesTime Series Analysis Adaptation

Applications & Tasks

Machine Learning Large-Scale Optimization Statistical Modeling Convergence Analysis of SGDHigh-Dimensional OptimizationUnderstanding SGD Properties Analyzing SGD/ASGD convergenceProving stationarity and moment convergenceUnderstanding behavior in high dimensions

Related Fields

OptimizationStatisticsMachine Learning TheoryTime Series AnalysisNumerical Analysis

Keywords

Stochastic Gradient DescentSGDASGDhigh-dimensional statisticsonline learningconvergence analysisgeometric moment contractionstationaritymoment convergencel-infinity normconstant learning rateoptimization algorithmslarge-scale learning

Academic Context

#Optimization Algorithms#High-Dimensional Statistics#Machine Learning Theory#Online Learning#Convergence Analysis

Commercial Potential

Target Industries

TechnologyAI Research

Use Case Examples

Training large-scale machine learning modelsAnalyzing the convergence behavior of optimization algorithms

Competitive Edge

Provides a more rigorous theoretical understanding of SGD/ASGD in high-dimensional settings than previously available.

Resource Requirements

Compute Needs

Analysis applies to algorithms used in high-compute environments.

Data Requirements

Applies to large datasets typical in high-dimensional settings.

Deployment Constraints

The theoretical results are for idealized SGD/ASGD; practical implementations may have variations.

Scalability

Focuses on the theoretical properties of SGD/ASGD, which are known for their scalability in practice.

Production Readiness

Maturity Level

Theoretical/Foundational

View Full Paper Back to Papers