Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Nonlinear activation functions are widely recognized for enhancing the
expressivity of neural networks, which is the primary reason for their
widespread implementation. In this work, we focus on ReLU activation and reveal
a novel and intriguing property of nonlinear activations. By comparing enabling
and disabling the nonlinear activations in the neural network, we demonstrate
their specific effects on wide neural networks: (a) better feature separation,
i.e., a larger angle separation for similar data in the feature space of model
gradient, and (b) better NTK conditioning, i.e., a smaller condition number of
neural tangent kernel (NTK). Furthermore, we show that the network depth (i.e.,
with more nonlinear activation operations) further amplifies these effects; in
addition, in the infinite-width-then-depth limit, all data are equally
separated with a fixed angle in the model gradient feature space, regardless of
how similar they are originally in the input space. Note that, without the
nonlinear activation, i.e., in a linear neural network, the data separation
remains the same as for the original inputs and NTK condition number is
equivalent to the Gram matrix, regardless of the network depth. Due to the
close connection between NTK condition number and convergence theories, our
results imply that nonlinear activation helps to improve the worst-case
convergence rates of gradient based methods.
Authors (4)
Chaoyue Liu
Han Bi
Like Hui
Xiao Liu
Key Contributions
This work reveals a novel property of ReLU nonlinear activations in wide neural networks: they improve feature separation in the model gradient space and enhance NTK conditioning (reduce condition number). These benefits are amplified by network depth, contributing to better expressivity and potentially more stable training, offering a theoretical explanation for the widespread use of nonlinearities.
Business Value
Provides fundamental insights into neural network behavior, guiding the design of more effective and stable deep learning architectures, potentially leading to improved performance in various AI applications.