Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We study gradient flows for loss landscapes of fully connected feedforward
neural networks with commonly used continuously differentiable activation
functions such as the logistic, hyperbolic tangent, softplus or GELU function.
We prove that the gradient flow either converges to a critical point or
diverges to infinity while the loss converges to an asymptotic critical value.
Moreover, we prove the existence of a threshold $\varepsilon>0$ such that the
loss value of any gradient flow initialized at most $\varepsilon$ above the
optimal level converges to it. For polynomial target functions and sufficiently
big architecture and data set, we prove that the optimal loss value is zero and
can only be realized asymptotically. From this setting, we deduce our main
result that any gradient flow with sufficiently good initialization diverges to
infinity. Our proof heavily relies on the geometry of o-minimal structures. We
confirm these theoretical findings with numerical experiments and extend our
investigation to more realistic scenarios, where we observe an analogous
behavior.
Authors (4)
Julian Kranz
Davide Gallon
Steffen Dereich
Arnulf Jentzen
Key Contributions
This paper analyzes gradient flows for neural networks with common activation functions, proving that flows either converge to a critical point or diverge to infinity. It establishes a threshold for convergence to the optimal level and shows that for polynomial targets, optimal loss is zero and often achieved asymptotically. A key result is that sufficiently well-initialized gradient flows diverge to infinity, proven using o-minimal structures.
Business Value
Fundamental theoretical understanding that can lead to more robust and predictable neural network training algorithms, potentially improving reliability and reducing training failures in complex ML applications.