On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth

Authors: Gennadiy Averkov, Christopher Hojny, Maximilian Merkert

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We follow up on this line of research and show that, within Re LU networks whose weights are decimal fractions, Fn can only be represented by networks with at least log3(n + 1) hidden layers. Moreover, if all weights are N-ary fractions, then Fn can only be represented by networks with at least Ω( ln n ln ln N ) layers. These results are a partial confirmation of the above conjecture for rational Re LU networks, and provide the first non-constant lower bound on the depth of practically relevant Re LU networks. To prove our main results, Theorems 2 and 4, we extend the ideas of Haase et al. (2023).
Researcher Affiliation Academia Gennadiy Averkov BTU Cottbus-Senftenberg EMAIL Christopher Hojny TU Eindhoven EMAIL Maximilian Merkert TU Braunschweig EMAIL
Pseudocode No The paper describes mathematical proofs and theoretical concepts (e.g., Theorem 2, Theorem 4, Proposition 11) but does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing source code or links to code repositories.
Open Datasets No The paper investigates the expressiveness of ReLU neural networks for the function Fn = max{0, x1, . . . , xn} and does not describe experiments using external datasets.
Dataset Splits No The paper does not conduct experiments involving datasets, and therefore, no information about dataset splits is provided.
Hardware Specification No The paper is theoretical, focusing on mathematical proofs and lower bounds for neural network depth, and as such, it does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not detail any experimental setup or software dependencies with version numbers.
Experiment Setup No The paper is theoretical, presenting mathematical proofs and analyses, and thus does not include details on experimental setup, hyperparameters, or training configurations.