On the Local Complexity of Linear Regions in Deep ReLU Networks

Authors: Niket Nikul Patel, Guido Montufar

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We replicate similar experiments in Figure 3. In this work, we aim to develop theory to explain some of the empirical results of Humayun et al. (2024b)." and "We empirically demonstrate this behavior on the MNIST dataset in Figure 3." and "We empirically validate this claim in Figure 13, where we demonstrate that the local complexity will typically be lower for networks trained with a larger weight decay.
Researcher Affiliation Academia 1Department of Mathematics, UCLA, USA 2Department of Statistics & Data Science, UCLA, USA 3MPI Mi S, Germany. Correspondence to: Niket Patel <EMAIL>, Guido Mont ufar <EMAIL>.
Pseudocode No The paper describes methods using mathematical formulations and textual explanations without presenting any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets Yes We empirically demonstrate this behavior on the MNIST dataset in Figure 3." and "Specifically, we show similar trends for the CIFAR-10 (Krizhevsky & Hinton, 2009) and Imagenette (Howard, 2019) datasets.
Dataset Splits No Here we train a 4 layer MLP with 200 neurons in each layer on a subset of 1000 images across all classes in the MNIST dataset." The paper mentions using subsets of datasets but does not provide specific training/test/validation splits or references to standard splits.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes We train a 4 layer MLP with 200 neurons in each layer... We use an initialization scale that is 2x the standard He initialization." and "We train with the Adam optimizer with learning rate 1e 4.