Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Smoothness is known to be crucial for acceleration in offline optimization,
and for gradient-variation regret minimization in online learning.
Interestingly, these two problems are actually closely connected -- accelerated
optimization can be understood through the lens of gradient-variation online
learning. In this paper, we investigate online learning with H\"older smooth
functions, a general class encompassing both smooth and non-smooth (Lipschitz)
functions, and explore its implications for offline optimization. For
(strongly) convex online functions, we design the corresponding
gradient-variation online learning algorithm whose regret smoothly interpolates
between the optimal guarantees in smooth and non-smooth regimes. Notably, our
algorithms do not require prior knowledge of the H\"older smoothness parameter,
exhibiting strong adaptivity over existing methods. Through online-to-batch
conversion, this gradient-variation online adaptivity yields an optimal
universal method for stochastic convex optimization under H\"older smoothness.
However, achieving universality in offline strongly convex optimization is more
challenging. We address this by integrating online adaptivity with a
detection-based guess-and-check procedure, which, for the first time, yields a
universal offline method that achieves accelerated convergence in the smooth
regime while maintaining near-optimal convergence in the non-smooth one.
Key Contributions
This paper introduces a novel gradient-variation online learning algorithm for Hölder smooth functions that achieves optimal regret guarantees, interpolating between smooth and non-smooth regimes. A key innovation is its strong adaptivity, not requiring prior knowledge of the Hölder smoothness parameter, which leads to improved performance over existing methods.
Business Value
Improved efficiency and robustness in online learning systems, leading to better decision-making in dynamic environments without requiring extensive parameter tuning.