Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Armijo line-search (Armijo-LS) is a standard method to set the step-size for
gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to
know the global smoothness constant L and adapts to the ``local'' smoothness,
enabling GD to converge faster. Existing theoretical analyses show that GD with
Armijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/L
step-size (denoted as GD(1/L)). We strengthen these results and show that if
the objective function satisfies a certain non-uniform smoothness condition,
GD-LS can result in a faster convergence rate than GD(1/L). In particular, we
prove that for convex objectives corresponding to logistic regression and
multi-class classification, GD-LS can converge to the optimum at a linear rate,
and hence improves over the sublinear convergence of GD(1/L). Furthermore, for
non-convex objectives satisfying gradient domination (e.g., those corresponding
to the softmax policy gradient in RL or generalized linear models with a
logistic link function), GD-LS can match the fast convergence of algorithms
tailored for these specific settings. Finally, we analyze the convergence of
stochastic GD with a stochastic line-search on convex losses under the
interpolation assumption.