Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We study the implicit bias of flatness / low (loss) curvature and its effects
on generalization in two-layer overparameterized ReLU networks with
multivariate inputs -- a problem well motivated by the minima stability and
edge-of-stability phenomena in gradient-descent training. Existing work either
requires interpolation or focuses only on univariate inputs. This paper
presents new and somewhat surprising theoretical results for multivariate
inputs. On two natural settings (1) generalization gap for flat solutions, and
(2) mean-squared error (MSE) in nonparametric function estimation by stable
minima, we prove upper and lower bounds, which establish that while flatness
does imply generalization, the resulting rates of convergence necessarily
deteriorate exponentially as the input dimension grows. This gives an
exponential separation between the flat solutions vis-\`a-vis low-norm
solutions (i.e., weight decay), which knowingly do not suffer from the curse of
dimensionality. In particular, our minimax lower bound construction, based on a
novel packing argument with boundary-localized ReLU neurons, reveals how flat
solutions can exploit a kind of ''neural shattering'' where neurons rarely
activate, but with high weight magnitudes. This leads to poor performance in
high dimensions. We corroborate these theoretical findings with extensive
numerical simulations. To the best of our knowledge, our analysis provides the
first systematic explanation for why flat minima may fail to generalize in high
dimensions.
Authors (4)
Tongtong Liang
Dan Qiao
Yu-Xiang Wang
Rahul Parhi
Key Contributions
This paper proves that stable minima (flat solutions) in overparameterized ReLU networks with multivariate inputs suffer from the curse of dimensionality, leading to exponentially deteriorating generalization rates. This establishes an exponential separation between flatness-based generalization and low-norm solutions, revealing a fundamental limitation of flatness.
Business Value
Provides fundamental insights into the limitations of deep learning models, guiding the development of more robust and scalable architectures that are less susceptible to the curse of dimensionality.