arxiv_ml 70% Match Research Paper ML Theorists,Deep Learning Researchers,Students of ML 1 week ago

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

computer-vision › model-architecture

📄 Abstract

Abstract: Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this effect, most theoretical insights are limited to abstract frameworks or linear/random feature models. In this paper, we provide a formal analysis of weak-to-strong generalization from a linear CNN (weak) to a two-layer ReLU CNN (strong). We consider structured data composed of label-dependent signals of varying difficulty and label-independent noise, and analyze gradient descent dynamics when the strong model is trained on data labeled by the pretrained weak model. Our analysis identifies two regimes -- data-scarce and data-abundant -- based on the signal-to-noise characteristics of the dataset, and reveals distinct mechanisms of weak-to-strong generalization. In the data-scarce regime, generalization occurs via benign overfitting or fails via harmful overfitting, depending on the amount of data, and we characterize the transition boundary. In the data-abundant regime, generalization emerges in the early phase through label correction, but we observe that overtraining can subsequently degrade performance.

Authors (3)

Junsoo Oh

Jerry Song

Chulhee Yun

Submitted

October 28, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper provides a formal theoretical analysis of weak-to-strong generalization in deep learning, specifically for CNNs. It identifies distinct mechanisms and regimes (data-scarce vs. data-abundant) governing this phenomenon, offering insights into how a weaker model can improve a stronger one through feature learning and gradient descent dynamics.

Business Value

Enhances fundamental understanding of deep learning, potentially leading to more robust and efficient model training strategies in the future.

Paper Metadata

Innovation Type

Theoretical

Deployment Feasibility

N/A (Theoretical)

Limitations Addressed

Lack of theoretical understanding for weak-to-strong generalization beyond abstract or linear models.

Performance Gains

N/A (Theoretical analysis)

Technical Tags

Weak-to-Strong GeneralizationFeature LearningDeep Learning TheoryConvolutional Neural NetworksGradient DescentOverfittingStructured DataLabel-Dependent SignalsLabel-Independent NoiseReLU Networks

Research Topics

Machine Learning TheoryGeneralization BoundsDeep LearningFeature Representation

Methods & Architectures

Theoretical AnalysisGradient Descent AnalysisMathematical Modeling Linear CNNTwo-layer ReLU CNN

Applications & Tasks

Machine Learning Research Understanding GeneralizationExplaining Model Behavior Theoretical analysis of generalization

Related Fields

Machine Learning TheoryOptimizationComputer VisionDeep Learning

Keywords

Weak-to-Strong GeneralizationDeep Learning TheoryFeature LearningCNNGradient DescentOverfittingGeneralizationReLUStructured DataMachine Learning

Academic Context

#Machine Learning Theory#Generalization Bounds#Deep Learning#Feature Representation

Commercial Potential

Competitive Edge

Advances theoretical understanding beyond existing linear or abstract models.

Market Opportunity

Fundamental research, impacts broad ML market.

Revenue Models

N/A

Resource Requirements

Compute Needs

N/A (Theoretical)

Data Requirements

N/A (Theoretical)

Deployment Constraints

N/A (Theoretical)

Scalability

N/A (Theoretical)

Production Readiness

Maturity Level

Theoretical

Time to Market

Long-term impact on ML development.

View Full Paper Back to Papers