arxiv_ml 70% Match Research Paper Machine learning theorists,Researchers in deep learning,Students of theoretical ML 2 weeks ago

The $\varphi$ Curve: The Shape of Generalization through the Lens of Norm-based Capacity Control

large-language-models › reasoning

📄 Abstract

Abstract: Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on parameter count typically fail to account for these empirical observations. To tackle this challenge, we consider norm-based capacity measures and develop our study for random features based estimators, widely used as simplified theoretical models for more complex networks. In this context, we provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error. Our results show that the predicted learning curve admits a phase transition from under- to over-parameterization, but no double descent behavior. This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size. From a technical point of view, we leverage deterministic equivalence as the key tool and further develop new deterministic quantities which are of independent interest.

Authors (4)

Yichen Wang

Yudong Chen

Lorenzo Rosasco

Fanghui Liu

Submitted

February 3, 2025

arXiv Category

stat.ML

arXiv PDF

Key Contributions

This paper provides a theoretical framework, the 'φ Curve', to understand generalization in machine learning by using norm-based capacity measures instead of parameter counts. It precisely characterizes how estimator norm concentration governs test error for random features models, revealing a phase transition but no double descent, recovering classical U-shaped behavior.

Business Value

Deepens the fundamental understanding of why and how deep learning models generalize, which can guide the development of more robust and predictable AI systems.

Paper Metadata

Innovation Type

Theoretical Framework

Deployment Feasibility

N/A (Theoretical work)

Limitations Addressed

Failure of classical theory (parameter count) to explain generalization in large over-parameterized deep networks,Lack of understanding of learning curves in modern deep learning

Performance Gains

Provides theoretical insights into generalization behavior.

Technical Tags

GeneralizationModel ComplexityNorm-based Capacity ControlRandom FeaturesTest RiskOver-parameterizationUnder-parameterizationLearning CurvesPhase TransitionDouble DescentU-shaped behaviorEstimator Norm Concentration

Research Topics

Machine Learning TheoryStatistical Learning TheoryDeep Learning TheoryModel Generalization

Methods & Architectures

Norm-based capacity measuresAnalysis of random features based estimatorsConcentration inequalities Random Features based estimators

Applications & Tasks

Theoretical Machine Learning Understanding generalization in over-parameterized modelsCharacterizing learning curvesRelating model complexity to test error Theoretical analysis of generalizationDeveloping new capacity measures

Related Fields

Machine Learning TheoryStatistical LearningDeep LearningOptimization

Keywords

GeneralizationMachine Learning TheoryCapacity ControlNormsOver-parameterizationLearning CurvesRandom FeaturesTest ErrorDouble DescentStatistical Learning TheoryModel Complexity

Academic Context

#Machine Learning Theory#Statistical Learning Theory#Deep Learning Theory#Model Generalization

Commercial Potential

Competitive Edge

Offers a novel theoretical perspective on generalization, complementing existing empirical and theoretical approaches.

Market Opportunity

N/A

Revenue Models

N/A

Resource Requirements

Compute Needs

Minimal (for theoretical analysis and simulations)

Data Requirements

Not applicable (theoretical analysis)

Scalability

Theoretical framework aims for broad applicability.

Production Readiness

Maturity Level

Theoretical

Time to Market

N/A

Patent Potential

Very low (theoretical work)

View Full Paper Back to Papers