Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

Authors: Karl Hajjar, Lénaïc Chizat, Christophe Giraud

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We confirm our results with numerical experiments on image classification tasks, which additionally show a strong difference in behavior between various choices of activation functions that is not yet captured by theory. In this section we investigate numerically the behavior of the models previously introduced in this work, namely Naive-IP, IP-LLR, IP-bias, IP-non-centered and µP. In contrast to the theoretical analysis carried out in Sections 3, 4, and 5, we examine the performance of the models on a multi-class classification task (instead of a single output prediction) and we train them using mini-batch SGD (instead of single-sample SGD).
Researcher Affiliation Academia Karl Hajjar EMAIL Laboratoire de Math ematiques d Orsay Universit e Paris-Saclay 91405 Orsay, France. L ena ıc Chizat EMAIL Institut de Math ematiques Ecole Polytechnique F ed erale de Lausanne Lausanne, Switzerland. Christophe Giraud EMAIL Laboratoire de Math ematiques d Orsay Universit e Paris-Saclay 91405 Orsay, France.
Pseudocode No The paper describes mathematical proofs and derivations, and presents an in-depth theoretical analysis of neural network parameterizations. It includes detailed mathematical lemmas and theorems. However, it does not feature any explicitly labeled 'Algorithm' or 'Pseudocode' sections, nor does it present any structured code-like blocks.
Open Source Code Yes The code to reproduce the results of the numerical experiments can be found at: https://github.com/karl-hajjar/wide-networks.
Open Datasets Yes We evaluate the performance of the different models on two datasets: MNIST2, containing 60,000 training samples and 10,000 test samples, and CIFAR-103, containing 50,000 training samples and 10,000 test samples. [...] 2. http://yann.lecun.com/exdb/mnist/ 3. https://www.cs.toronto.edu/~kriz/cifar.html
Dataset Splits Yes We evaluate the performance of the different models on two datasets: MNIST2, containing 60,000 training samples and 10,000 test samples, and CIFAR-103, containing 50,000 training samples and 10,000 test samples. Both datasets consist in a 10-class image classification task.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used for conducting the numerical experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes We train for 600 SGD steps on MNIST and 1200 steps on CIFAR-10 using a base learning rate η = 0.01, a batch-size B = 512, and the cross-entropy loss, which satisfies Assumption 1. For each experiment, we run Ntrials = 5 trials with different random initializations. The hyperparameters are summarized in Table 2. L 6 m 1024 d MNIST 784 d CIFAR 1024 ℓ cross-ent. η 0.01 B 512 Ntrials 5