Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Understanding feature learning is an important open question in establishing
a mathematical foundation for deep neural networks. The Neural Feature Ansatz
(NFA) states that after training, the Gram matrix of the first-layer weights of
a deep neural network is proportional to some power $\alpha>0$ of the average
gradient outer product (AGOP) of this network with respect to its inputs.
Assuming gradient flow dynamics with balanced weight initialization, the NFA
was proven to hold throughout training for two-layer linear networks with
exponent $\alpha = 1/2$ (Radhakrishnan et al., 2024). We extend this result to
networks with $L \geq 2$ layers, showing that the NFA holds with exponent
$\alpha = 1/L$, thus demonstrating a depth dependency of the NFA. Furthermore,
we prove that for unbalanced initialization, the NFA holds asymptotically
through training if weight decay is applied. We also provide counterexamples
showing that the NFA does not hold for some network architectures with
nonlinear activations, even when these networks fit arbitrarily well the
training data. We thoroughly validate our theoretical results through numerical
experiments across a variety of optimization algorithms, weight decay rates and
initialization schemes.
Authors (3)
Edward Tansley
Estelle Massart
Coralia Cartis
Submitted
October 17, 2025
Key Contributions
This paper extends the Neural Feature Ansatz (NFA) to deep neural networks with L >= 2 layers, proving it holds with an exponent alpha = 1/L, demonstrating a depth dependency. It also proves the NFA holds asymptotically with weight decay for unbalanced initialization and provides counterexamples for certain architectures, contributing to a deeper theoretical understanding of feature learning in deep networks.
Business Value
Advances fundamental understanding of deep learning, which can indirectly lead to more robust, interpretable, and efficient models in the long run.