arxiv_ml 90% Match Research Paper Theoretical machine learning researchers,Deep learning theorists,Students of ML theory 2 weeks ago

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

ai-safety › interpretability

📄 Abstract

Abstract: We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs -- a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions vis-\`a-vis low-norm solutions (i.e., weight decay), which knowingly do not suffer from the curse of dimensionality. In particular, our minimax lower bound construction, based on a novel packing argument with boundary-localized ReLU neurons, reveals how flat solutions can exploit a kind of ''neural shattering'' where neurons rarely activate, but with high weight magnitudes. This leads to poor performance in high dimensions. We corroborate these theoretical findings with extensive numerical simulations. To the best of our knowledge, our analysis provides the first systematic explanation for why flat minima may fail to generalize in high dimensions.

Authors (4)

Tongtong Liang

Dan Qiao

Yu-Xiang Wang

Rahul Parhi

Submitted

June 25, 2025

arXiv Category

stat.ML

arXiv PDF

Key Contributions

This paper proves that stable minima (flat solutions) in overparameterized ReLU networks with multivariate inputs suffer from the curse of dimensionality, leading to exponentially deteriorating generalization rates. This establishes an exponential separation between flatness-based generalization and low-norm solutions, revealing a fundamental limitation of flatness.

Business Value

Provides fundamental insights into the limitations of deep learning models, guiding the development of more robust and scalable architectures that are less susceptible to the curse of dimensionality.

Paper Metadata

Innovation Type

Theoretical

Deployment Feasibility

N/A (Theoretical work).

Limitations Addressed

Addresses the lack of theoretical understanding for generalization in multivariate ReLU networks, particularly concerning the role of flatness and its interaction with input dimension, extending prior work that focused on univariate inputs or interpolation.

Technical Tags

ReLU networksgeneralizationcurse of dimensionalityimplicit biasflat minimalow curvatureoverparameterized networksmultivariate inputsnonparametric estimationMSE

Research Topics

Deep Learning TheoryGeneralization BoundsOptimization TheoryNeural Network PropertiesHigh-Dimensional Statistics

Methods & Architectures

Theoretical AnalysisUpper and Lower Bound ProofsNonparametric Function Estimation Two-layer ReLU networks

Applications & Tasks

Machine Learning Theory Deep Learning Generalization AnalysisUnderstanding Neural Network BehaviorTheoretical ML Analyzing the impact of flatness/low curvature on generalization in ReLU networks with multivariate inputs.

Related Fields

Machine LearningStatisticsOptimizationTheoretical Computer Science

Keywords

ReLU networksgeneralizationcurse of dimensionalityimplicit biasflat minimalow curvatureoverparameterized networksmultivariate inputsnonparametric estimationMSEneural networksdeep learning theoryoptimization

Academic Context

#Deep Learning Theory#Generalization Bounds#Optimization Theory#Neural Network Properties#High-Dimensional Statistics

Commercial Potential

Competitive Edge

Provides a theoretical counterpoint to the benefits of flatness by highlighting its exponential decay in performance with dimensionality, contrasting it with weight decay methods.

Market Opportunity

N/A.

Revenue Models

N/A.

Resource Requirements

Compute Needs

N/A (Theoretical work).

Data Requirements

N/A (Theoretical work).

Deployment Constraints

N/A (Theoretical work).

Scalability

Focuses on the *lack* of scalability (curse of dimensionality) for flat minima in high dimensions.

Regulatory Considerations

N/A.

Production Readiness

Maturity Level

Theoretical Foundation

Time to Market

N/A.

Patent Potential

Very Low.

View Full Paper Back to Papers