reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

Authors: Karl Hajjar, Lénaïc Chizat, Christophe Giraud

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conﬁrm our results with numerical experiments on image classiﬁcation tasks, which additionally show a strong diﬀerence in behavior between various choices of activation functions that is not yet captured by theory. In this section we investigate numerically the behavior of the models previously introduced in this work, namely Naive-IP, IP-LLR, IP-bias, IP-non-centered and µP. In contrast to the theoretical analysis carried out in Sections 3, 4, and 5, we examine the performance of the models on a multi-class classiﬁcation task (instead of a single output prediction) and we train them using mini-batch SGD (instead of single-sample SGD).
Researcher Affiliation	Academia	Karl Hajjar EMAIL Laboratoire de Math ematiques d Orsay Universit e Paris-Saclay 91405 Orsay, France. L ena ıc Chizat EMAIL Institut de Math ematiques Ecole Polytechnique F ed erale de Lausanne Lausanne, Switzerland. Christophe Giraud EMAIL Laboratoire de Math ematiques d Orsay Universit e Paris-Saclay 91405 Orsay, France.
Pseudocode	No	The paper describes mathematical proofs and derivations, and presents an in-depth theoretical analysis of neural network parameterizations. It includes detailed mathematical lemmas and theorems. However, it does not feature any explicitly labeled 'Algorithm' or 'Pseudocode' sections, nor does it present any structured code-like blocks.
Open Source Code	Yes	The code to reproduce the results of the numerical experiments can be found at: https://github.com/karl-hajjar/wide-networks.
Open Datasets	Yes	We evaluate the performance of the diﬀerent models on two datasets: MNIST2, containing 60,000 training samples and 10,000 test samples, and CIFAR-103, containing 50,000 training samples and 10,000 test samples. [...] 2. http://yann.lecun.com/exdb/mnist/ 3. https://www.cs.toronto.edu/~kriz/cifar.html
Dataset Splits	Yes	We evaluate the performance of the diﬀerent models on two datasets: MNIST2, containing 60,000 training samples and 10,000 test samples, and CIFAR-103, containing 50,000 training samples and 10,000 test samples. Both datasets consist in a 10-class image classiﬁcation task.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used for conducting the numerical experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	We train for 600 SGD steps on MNIST and 1200 steps on CIFAR-10 using a base learning rate η = 0.01, a batch-size B = 512, and the cross-entropy loss, which satisﬁes Assumption 1. For each experiment, we run Ntrials = 5 trials with diﬀerent random initializations. The hyperparameters are summarized in Table 2. L 6 m 1024 d MNIST 784 d CIFAR 1024 ℓ cross-ent. η 0.01 B 512 Ntrials 5