arxiv_ml 75% Match Research Paper ML Theorists,Optimization Researchers,Data Scientists,Machine Learning Engineers 3 days ago

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

large-language-models › training-methods

📄 Abstract

Abstract: We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $\widetilde{\mathcal{O}}(\kappa)$ steps with $\kappa$ being the condition number. Surprisingly, we show that this can be accelerated to $\widetilde{\mathcal{O}}(\sqrt{\kappa})$ by simply using a large stepsize -- for which the objective evolves nonmonotonically. The acceleration brought by large stepsizes extends to minimizing the population risk for separable distributions, improving on the best-known upper bounds on the number of steps to reach a near-optimum. Finally, we characterize the largest stepsize for the local convergence of GD, which also determines the global convergence in special scenarios. Our results extend the analysis of Wu et al. (2024) from convex settings with minimizers at infinity to strongly convex cases with finite minimizers.

Authors (3)

Jingfeng Wu

Pierre Marion

Peter Bartlett

Submitted

June 3, 2025

arXiv Category

stat.ML

arXiv PDF

Key Contributions

Shows that gradient descent with a large constant stepsize can accelerate convergence for $\ell_2$-regularized logistic regression with linearly separable data to $\widetilde{\mathcal{O}}(\sqrt{\kappa})$ steps, even though the objective evolves non-monotonically. This improves upon classical theory requiring small stepsizes for monotonic convergence.

Business Value

Provides theoretical insights that can lead to faster and more efficient training of machine learning models, particularly for logistic regression tasks, potentially reducing computational costs.

Paper Metadata

Innovation Type

Theoretical Analysis and Algorithmic Insight

Deployment Feasibility

High, as it concerns fundamental optimization algorithms used widely in ML.

Limitations Addressed

Challenges the classical assumption that small stepsizes are necessary for monotonic convergence in gradient descent, demonstrating acceleration with large stepsizes for a specific class of problems.

Performance Gains

Achieves accelerated convergence rates ($\widetilde{\mathcal{O}}(\sqrt{\kappa})$ vs. $\widetilde{\mathcal{O}}(\kappa)$) for $\ell_2$-regularized logistic regression.

Technical Tags

Gradient DescentStepsizeRegularized Logistic RegressionLinearly Separable DataConvergence AnalysisCondition NumberOptimization TheoryPopulation RiskLocal ConvergenceNon-monotonic Convergence

Research Topics

Optimization AlgorithmsMachine Learning TheoryConvex OptimizationStatistical LearningConvergence Analysis

Methods & Architectures

Gradient Descent (GD) with constant stepsize

Applications & Tasks

Machine Learning Theory Statistical Modeling Accelerating gradient descent convergenceAnalyzing GD behavior with large stepsizesImproving convergence bounds for logistic regressionUnderstanding optimization dynamics Model trainingOptimization

Related Fields

Optimization TheoryMachine LearningStatisticsNumerical AnalysisConvex Optimization

Keywords

Gradient DescentStepsizeLogistic RegressionOptimizationConvergenceRegularizationMachine Learning TheoryCondition NumberNon-monotonicAlgorithm AnalysisStatistics

Academic Context

#Optimization Algorithms#Machine Learning Theory#Convex Optimization#Statistical Learning#Convergence Analysis

Commercial Potential

Potential Products

Optimized gradient descent implementationsFaster model training libraries

Target Industries

TechnologySoftware DevelopmentData Science Services

Use Case Examples

Speeding up the training of classification models in large datasetsDeveloping more efficient optimization routines for machine learning algorithms

Competitive Edge

Offers a theoretical advancement in understanding gradient descent dynamics, revealing a counter-intuitive acceleration mechanism with large stepsizes for a specific, yet important, class of problems.

Market Opportunity

Fundamental research impacting broad ML training practices.

Revenue Models

N/A

Resource Requirements

Compute Needs

Theoretical analysis, no direct computational requirements.

Data Requirements

Linearly separable data for logistic regression.

Deployment Constraints

Theoretical results apply to specific conditions (linearly separable data, $\ell_2$-regularized logistic regression).

Scalability

The findings relate to the convergence speed of the optimization algorithm, impacting scalability.

Production Readiness

Maturity Level

Theoretical Research

Time to Market

N/A (theoretical)

Patent Potential

Very Low

View Full Paper Back to Papers