Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 75% Match Research Paper ML Theorists,Optimization Researchers,Data Scientists,Machine Learning Engineers 3 days ago

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression

large-language-models › training-methods
📄 Abstract

Abstract: We study gradient descent (GD) with a constant stepsize for $\ell_2$-regularized logistic regression with linearly separable data. Classical theory suggests small stepsizes to ensure monotonic reduction of the optimization objective, achieving exponential convergence in $\widetilde{\mathcal{O}}(\kappa)$ steps with $\kappa$ being the condition number. Surprisingly, we show that this can be accelerated to $\widetilde{\mathcal{O}}(\sqrt{\kappa})$ by simply using a large stepsize -- for which the objective evolves nonmonotonically. The acceleration brought by large stepsizes extends to minimizing the population risk for separable distributions, improving on the best-known upper bounds on the number of steps to reach a near-optimum. Finally, we characterize the largest stepsize for the local convergence of GD, which also determines the global convergence in special scenarios. Our results extend the analysis of Wu et al. (2024) from convex settings with minimizers at infinity to strongly convex cases with finite minimizers.
Authors (3)
Jingfeng Wu
Pierre Marion
Peter Bartlett
Submitted
June 3, 2025
arXiv Category
stat.ML
arXiv PDF

Key Contributions

Shows that gradient descent with a large constant stepsize can accelerate convergence for $\ell_2$-regularized logistic regression with linearly separable data to $\widetilde{\mathcal{O}}(\sqrt{\kappa})$ steps, even though the objective evolves non-monotonically. This improves upon classical theory requiring small stepsizes for monotonic convergence.

Business Value

Provides theoretical insights that can lead to faster and more efficient training of machine learning models, particularly for logistic regression tasks, potentially reducing computational costs.