arxiv_ml 60% Match Research Paper ML Theorists,Deep Learning Researchers,PhD Students in ML 2 weeks ago

On the Neural Feature Ansatz for Deep Neural Networks

graph-neural-networks › graph-learning

📄 Abstract

Abstract: Understanding feature learning is an important open question in establishing a mathematical foundation for deep neural networks. The Neural Feature Ansatz (NFA) states that after training, the Gram matrix of the first-layer weights of a deep neural network is proportional to some power $\alpha>0$ of the average gradient outer product (AGOP) of this network with respect to its inputs. Assuming gradient flow dynamics with balanced weight initialization, the NFA was proven to hold throughout training for two-layer linear networks with exponent $\alpha = 1/2$ (Radhakrishnan et al., 2024). We extend this result to networks with $L \geq 2$ layers, showing that the NFA holds with exponent $\alpha = 1/L$, thus demonstrating a depth dependency of the NFA. Furthermore, we prove that for unbalanced initialization, the NFA holds asymptotically through training if weight decay is applied. We also provide counterexamples showing that the NFA does not hold for some network architectures with nonlinear activations, even when these networks fit arbitrarily well the training data. We thoroughly validate our theoretical results through numerical experiments across a variety of optimization algorithms, weight decay rates and initialization schemes.

Authors (3)

Edward Tansley

Estelle Massart

Coralia Cartis

Submitted

October 17, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper extends the Neural Feature Ansatz (NFA) to deep neural networks with L >= 2 layers, proving it holds with an exponent alpha = 1/L, demonstrating a depth dependency. It also proves the NFA holds asymptotically with weight decay for unbalanced initialization and provides counterexamples for certain architectures, contributing to a deeper theoretical understanding of feature learning in deep networks.

Business Value

Advances fundamental understanding of deep learning, which can indirectly lead to more robust, interpretable, and efficient models in the long run.

Paper Metadata

Innovation Type

Theoretical Advancement

Deployment Feasibility

N/A (Theoretical research).

Limitations Addressed

Lack of theoretical understanding of feature learning in deep neural networks, specifically the relationship between weights, gradients, and input statistics.

Performance Gains

N/A (Theoretical paper).

Technical Tags

Neural Feature Ansatz (NFA)deep neural networksfeature learninggradient flowweight initializationGram matrixaverage gradient outer product (AGOP)linear networksweight decaynetwork architecture

Research Topics

Deep Learning TheoryFeature LearningNeural Network DynamicsMathematical Foundations of DNNsOptimization Theory

Methods & Architectures

Proof by inductionMathematical analysisGradient flow dynamicsCounterexample construction Two-layer linear networksMulti-layer networks (L >= 2)

Applications & Tasks

Theoretical Deep Learning Machine Learning Research Understanding feature learning in DNNsEstablishing mathematical foundations for DNNs Theoretical analysis of neural networksUnderstanding weight evolution during training

Related Fields

Machine Learning TheoryDeep LearningOptimizationLinear AlgebraMathematical Physics

Keywords

Neural Feature AnsatzNFAdeep neural networksfeature learninggradient flowweight initializationGram matrixAGOPweight decaytheorymathematical foundationdepth dependency

Academic Context

#Deep Learning Theory#Feature Learning#Neural Network Dynamics#Mathematical Foundations of DNNs#Optimization Theory

Commercial Potential

Competitive Edge

Contributes to the theoretical understanding of deep learning, providing foundational insights that complement empirical advancements.

Resource Requirements

Compute Needs

N/A (Theoretical analysis).

Data Requirements

N/A (Theoretical analysis).

Deployment Constraints

N/A (Theoretical analysis).

Scalability

N/A (Theoretical analysis).

Production Readiness

Maturity Level

Theoretical Research

View Full Paper Back to Papers