Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This paper establishes a continuous time approximation, a piece-wise
continuous differential equation, for the discrete Heavy-Ball (HB) momentum
method with explicit discretization error. Investigating continuous
differential equations has been a promising approach for studying the discrete
optimization methods. Despite the crucial role of momentum in gradient-based
optimization methods, the gap between the original discrete dynamics and the
continuous time approximations due to the discretization error has not been
comprehensively bridged yet. In this work, we study the HB momentum method in
continuous time while putting more focus on the discretization error to provide
additional theoretical tools to this area. In particular, we design a
first-order piece-wise continuous differential equation, where we add a number
of counter terms to account for the discretization error explicitly. As a
result, we provide a continuous time model for the HB momentum method that
allows the control of discretization error to arbitrary order of the step size.
As an application, we leverage it to find a new implicit regularization of the
directional smoothness and investigate the implicit bias of HB for diagonal
linear networks, indicating how our results can be used in deep learning. Our
theoretical findings are further supported by numerical experiments.
Authors (6)
Bochen Lyu
Xiaojing Zhang
Fangyi Zheng
He Wang
Zheng Wang
Zhanxing Zhu
Key Contributions
This paper bridges the gap between discrete and continuous time approximations for the Heavy-Ball momentum method by explicitly accounting for discretization error. It introduces a novel first-order piece-wise continuous differential equation with counter terms to model this error, providing new theoretical tools for understanding gradient-based optimization.
Business Value
Improved theoretical understanding of optimization algorithms can lead to more efficient and robust training of machine learning models, potentially reducing computational costs and improving performance in various AI applications.