Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 70% Match Research Paper Machine learning theorists,Researchers in deep learning,Students of theoretical ML 2 weeks ago

The $\varphi$ Curve: The Shape of Generalization through the Lens of Norm-based Capacity Control

large-language-models › reasoning
📄 Abstract

Abstract: Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on parameter count typically fail to account for these empirical observations. To tackle this challenge, we consider norm-based capacity measures and develop our study for random features based estimators, widely used as simplified theoretical models for more complex networks. In this context, we provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error. Our results show that the predicted learning curve admits a phase transition from under- to over-parameterization, but no double descent behavior. This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size. From a technical point of view, we leverage deterministic equivalence as the key tool and further develop new deterministic quantities which are of independent interest.
Authors (4)
Yichen Wang
Yudong Chen
Lorenzo Rosasco
Fanghui Liu
Submitted
February 3, 2025
arXiv Category
stat.ML
arXiv PDF

Key Contributions

This paper provides a theoretical framework, the 'φ Curve', to understand generalization in machine learning by using norm-based capacity measures instead of parameter counts. It precisely characterizes how estimator norm concentration governs test error for random features models, revealing a phase transition but no double descent, recovering classical U-shaped behavior.

Business Value

Deepens the fundamental understanding of why and how deep learning models generalize, which can guide the development of more robust and predictable AI systems.