reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How many samples are needed to train a deep neural network?

Authors: Pegah Golestaneh, Mahsa Taheri, Johannes Lederer

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical and empirical results suggest that the generalization error of Re LU feed-forward neural networks scales at the rate 1/ n in the sample size n rather than the parametric rate 1/n, which could be suggested by traditional statistical theories. Thus, broadly speaking, our results underpin the common belief that neural networks need many training samples. Along the way, we also establish new technical insights, such as the first lower bounds of the entropy of Re LU feed-forward networks. ... In Section 5, we shift our focus to the empirical findings to support our theories. ... This section supports our theoretical findings with simulations on benchmark datasets.
Researcher Affiliation	Academia	Pegah Golestaneh, Mahsa Taheri & Johannes Lederer Department of Mathematics, Computer Science, and Natural Sciences University of Hamburg EMAIL
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes methods mathematically and textually.
Open Source Code	No	The paper mentions that "The implementation of these neural networks was carried out using the Tensor Flow library (see Appendix C for further details)." However, this refers to a third-party library used for implementation and not the authors' own source code for the methodology described in the paper. There is no explicit statement about releasing their code or a link to a repository.
Open Datasets	Yes	For our experiments, we consider both classification and regression tasks. The datasets used include MNIST, Fashion-MNIST and CIFAR10 for classification, and the California Housing Prices (CHP) dataset for regression analysis. ... For example, we imported the Fashion-MNIST dataset from tensorflow.keras.datasets package.
Dataset Splits	Yes	The MNIST dataset consists of 60 000 training images and 10 000 testing images, each with dimensions of 28 28 pixels. ... The Fashion-MNIST dataset contains 60 000 training images and 10 000 testing images, both with dimensions of 28 28 pixels. ... The CIFAR10 dataset contains 50 000 training images and 10 000 testing images, both with dimensions of 32 32 pixels. ... The version considered in this study comprises 8 numeric input attributes and a dataset of 20 640 samples. These samples were randomly divided into 15 000 for the training data and the remaining for the test data.
Hardware Specification	Yes	1. Computer resources: we conducted some of the experiments in Python using Google Colab and some of them using the basic plan of deepnote (https://deepnote.com). For the regression dataset, we used the basic plan of them that utilizes a machine with 5GB RAM and 2v CPU. For the CIFAR10 dataset, we used one of the deepnote s plans that utilizes a machine with 16GB RAM and 4 v CPUs.
Software Dependencies	No	The implementation of these neural networks was carried out using the Tensor Flow library (see Appendix C for further details). ... Optimizing these parameters is achieved through the Sequential Least Squares Quadratic Programming (SLSQP) method (Kraft, 1988) and the minimize function from scipy.optimize is employed for SLSQP implementation. ... In the training procedure for our experiments, we have used Adam optimization method. The paper mentions software like TensorFlow, Scipy, and Adam optimizer but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We use Cross-entropy (CE) and Mean-squared (MS) error as loss functions for classification and regression datasets, respectively. ... Optimizing these parameters is achieved through the Sequential Least Squares Quadratic Programming (SLSQP) method (Kraft, 1988) and the minimize function from scipy.optimize is employed for SLSQP implementation. The objective function calculates the sum of squared differences between the generalization error of a neural network and two separate curves. ... The batch size for the training samples is set to 20. ... In the training procedure for our experiments, we have used Adam optimization method.