Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 80% Match Theoretical Research Researchers in optimization,Machine learning practitioners,Applied mathematicians 4 days ago

A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees

ai-safety › robustness
📄 Abstract

Abstract: Finding an $\epsilon$-stationary point of a nonconvex function with a Lipschitz continuous Hessian is a central problem in optimization. Regularized Newton methods are a classical tool and have been studied extensively, yet they still face a trade-off between global and local convergence. Whether a parameter-free algorithm of this type can simultaneously achieve optimal global complexity and quadratic local convergence remains an open question. To bridge this long-standing gap, we propose a new class of regularizers constructed from the current and previous gradients, and leverage the conjugate gradient approach with a negative curvature monitor to solve the regularized Newton equation. The proposed algorithm is adaptive, requiring no prior knowledge of the Hessian Lipschitz constant, and achieves a global complexity of $O(\epsilon^{-3/2})$ in terms of the second-order oracle calls, and $\tilde{O}(\epsilon^{-7/4})$ for Hessian-vector products, respectively. When the iterates converge to a point where the Hessian is positive definite, the method exhibits quadratic local convergence. Preliminary numerical results, including training the physics-informed neural networks, illustrate the competitiveness of our algorithm.
Authors (6)
Yuhao Zhou
Jintao Xu
Bingrui Li
Chenglong Bao
Chao Ding
Jun Zhu
Submitted
February 7, 2025
arXiv Category
math.OC
arXiv PDF

Key Contributions

Proposes a new class of regularized Newton methods for nonconvex optimization that achieves optimal global complexity ($O(\epsilon^{-3/2})$ for oracle calls) and improved Hessian-vector product complexity ($\tilde{O}(\epsilon^{-7/4})$) while maintaining adaptive behavior. This bridges the gap between global and local convergence guarantees for parameter-free algorithms.

Business Value

Enables faster and more reliable training of complex machine learning models, especially those with non-convex loss landscapes, leading to improved performance and reduced training time.