Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Residual networks have significantly better trainability and thus performance
than feed-forward networks at large depth. Introducing skip connections
facilitates signal propagation to deeper layers. In addition, previous works
found that adding a scaling parameter for the residual branch further improves
generalization performance. While they empirically identified a particularly
beneficial range of values for this scaling parameter, the associated
performance improvement and its universality across network hyperparameters yet
need to be understood. For feed-forward networks, finite-size theories have led
to important insights with regard to signal propagation and hyperparameter
tuning. We here derive a systematic finite-size field theory for residual
networks to study signal propagation and its dependence on the scaling for the
residual branch. We derive analytical expressions for the response function, a
measure for the network's sensitivity to inputs, and show that for deep
networks the empirically found values for the scaling parameter lie within the
range of maximal sensitivity. Furthermore, we obtain an analytical expression
for the optimal scaling parameter that depends only weakly on other network
hyperparameters, such as the weight variance, thereby explaining its
universality across hyperparameters. Overall, this work provides a theoretical
framework to study ResNets at finite size.
Authors (3)
Kirsten Fischer
David Dahmen
Moritz Helias
arXiv Category
cond-mat.dis-nn
Key Contributions
Derives a systematic finite-size field theory for Residual Networks (ResNets) to analyze signal propagation and its dependence on the scaling parameter in the residual branch. It provides analytical expressions for the response function, offering insights into trainability and generalization.
Business Value
Contributes to a deeper theoretical understanding of deep neural networks, potentially leading to more principled design and optimization of future network architectures.