Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 70% Match Research Paper Machine learning theorists,Deep learning researchers,Statistical physicists 2 weeks ago

A simple mean field model of feature learning

large-language-models β€Ί training-methods
πŸ“„ Abstract

Abstract: Feature learning (FL), where neural networks adapt their internal representations during training, remains poorly understood. Using methods from statistical physics, we derive a tractable, self-consistent mean-field (MF) theory for the Bayesian posterior of two-layer non-linear networks trained with stochastic gradient Langevin dynamics (SGLD). At infinite width, this theory reduces to kernel ridge regression, but at finite width it predicts a symmetry breaking phase transition where networks abruptly align with target functions. While the basic MF theory provides theoretical insight into the emergence of FL in the finite-width regime, semi-quantitatively predicting the onset of FL with noise or sample size, it substantially underestimates the improvements in generalisation after the transition. We trace this discrepancy to a key mechanism absent from the plain MF description: \textit{self-reinforcing input feature selection}. Incorporating this mechanism into the MF theory allows us to quantitatively match the learning curves of SGLD-trained networks and provides mechanistic insight into FL.
Authors (4)
Niclas GΓΆring
Chris Mingard
Yoonsoo Nam
Ard Louis
Submitted
October 16, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper derives a tractable mean-field (MF) theory for feature learning in two-layer neural networks trained with SGLD, using methods from statistical physics. It predicts a phase transition where networks align with target functions and identifies 'self-reinforcing input feature selection' as a key mechanism absent in basic MF theory, which is crucial for explaining improved generalization.

Business Value

Deeper theoretical understanding of how neural networks learn representations, potentially leading to more efficient and effective model design and training.