arxiv_ml 70% Match Research Paper Machine learning theorists,Deep learning researchers,Statistical physicists 2 weeks ago

A simple mean field model of feature learning

large-language-models › training-methods

📄 Abstract

Abstract: Feature learning (FL), where neural networks adapt their internal representations during training, remains poorly understood. Using methods from statistical physics, we derive a tractable, self-consistent mean-field (MF) theory for the Bayesian posterior of two-layer non-linear networks trained with stochastic gradient Langevin dynamics (SGLD). At infinite width, this theory reduces to kernel ridge regression, but at finite width it predicts a symmetry breaking phase transition where networks abruptly align with target functions. While the basic MF theory provides theoretical insight into the emergence of FL in the finite-width regime, semi-quantitatively predicting the onset of FL with noise or sample size, it substantially underestimates the improvements in generalisation after the transition. We trace this discrepancy to a key mechanism absent from the plain MF description: \textit{self-reinforcing input feature selection}. Incorporating this mechanism into the MF theory allows us to quantitatively match the learning curves of SGLD-trained networks and provides mechanistic insight into FL.

Authors (4)

Niclas Göring

Chris Mingard

Yoonsoo Nam

Ard Louis

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper derives a tractable mean-field (MF) theory for feature learning in two-layer neural networks trained with SGLD, using methods from statistical physics. It predicts a phase transition where networks align with target functions and identifies 'self-reinforcing input feature selection' as a key mechanism absent in basic MF theory, which is crucial for explaining improved generalization.

Business Value

Deeper theoretical understanding of how neural networks learn representations, potentially leading to more efficient and effective model design and training.

Paper Metadata

Innovation Type

Theoretical

Deployment Feasibility

Primarily theoretical; insights can inform practical model development.

Limitations Addressed

The poor understanding of feature learning in neural networks and the limitations of existing mean-field theories in fully explaining generalization improvements.

Performance Gains

Semi-quantitatively predicts onset of FL,Explains improvements in generalization post-transition

Technical Tags

feature learningneural networksstatistical physicsmean-field theoryBayesian posteriorstochastic gradient Langevin dynamics (SGLD)infinite width limitkernel ridge regressionphase transitionself-reinforcing input feature selection

Research Topics

Deep Learning TheoryStatistical PhysicsRepresentation LearningBayesian InferenceOptimization Theory

Methods & Architectures

Mean-field (MF) theoryStochastic Gradient Langevin Dynamics (SGLD)Bayesian inferenceAnalysis of phase transitionsIncorporating self-reinforcing mechanisms Two-layer non-linear networks

Applications & Tasks

Machine Learning Theory Deep Learning Research Understanding feature learningExplaining generalization in neural networksModeling representation adaptation Deriving a tractable mean-field theory for feature learningPredicting phase transitions in network trainingExplaining improvements in generalization

Related Fields

Statistical PhysicsMachine Learning TheoryDeep LearningBayesian MethodsOptimization

Keywords

Feature LearningNeural NetworksMean Field TheoryStatistical PhysicsSGLDPhase TransitionGeneralizationRepresentation LearningBayesian PosteriorInfinite Width

Academic Context

#Deep Learning Theory#Statistical Physics#Representation Learning#Bayesian Inference#Optimization Theory

Commercial Potential

Target Industries

AI ResearchTechnology Development

Use Case Examples

Understanding why deep networks generalize wellDesigning networks with better feature learning capabilities

Competitive Edge

Extends existing mean-field theories by incorporating a crucial mechanism for generalization, providing a more complete picture of feature learning.

Market Opportunity

Fundamental research impacting the entire AI field.

Revenue Models

N/A (fundamental research).

Resource Requirements

Compute Needs

Theoretical analysis, computational requirements are for simulations to validate the theory.

Data Requirements

Simulated data for training two-layer networks.

Deployment Constraints

Theoretical framework, not directly deployable as a system.

Scalability

The mean-field theory is derived in the infinite width limit, providing insights into large-scale networks.

Production Readiness

Maturity Level

Research

Time to Market

Long-term, as theoretical insights inform future model development.

View Full Paper Back to Papers