Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 75% Match Research Paper Theoretical ML Researchers,Computer Scientists,Deep Learning Theorists 2 weeks ago

Depth-Bounds for Neural Networks via the Braid Arrangement

large-language-models › model-architecture
📄 Abstract

Abstract: We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on $\mathbb{R}^d$. While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely with the braid fan. For such neural networks, we prove a non-constant lower bound of $\Omega(\log\log d)$ hidden layers required to exactly represent the maximum of $d$ numbers. Additionally, under our assumption, we provide a combinatorial proof that 3 hidden layers are necessary to compute the maximum of 5 numbers; this had only been verified with an excessive computation so far. Finally, we show that a natural generalization of the best known upper bound to maxout networks is not tight, by demonstrating that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.
Authors (3)
Moritz Grillo
Christoph Hertrich
Georg Loho
Submitted
February 13, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper establishes non-constant lower bounds on the depth of ReLU networks for representing certain piecewise linear functions, specifically proving $\Omega(\log\log d)$ layers are needed for the max function on $d$ numbers under assumptions related to the braid fan. It also provides a combinatorial proof for the necessity of 3 layers for computing the max of 5 numbers and shows limitations of generalizing upper bounds to maxout networks.

Business Value

Contributes to a fundamental understanding of neural network expressivity, which can guide the design of more efficient and powerful architectures in the future.