reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Authors: Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our central goal is to paint a complete picture of how two-layer neural networks adapt to the features of training data (zν, yν)n ν=1 Rd+1 in the early phase of training after the ﬁrst few steps of gradient descent. ... Appendix A. Numerical investigation. In this section we explain the procedures to get the diﬀerent ﬁgures in the main text, along with the details behind the numerical experiments. We provide as well additional plots corroborating the theoretical results presented in the main manuscript. The code is available on Git Hub.
Researcher Affiliation	Academia	Yatin Dandi EMAIL Florent Krzakala EMAIL Bruno Loureiro EMAIL Luca Pesce EMAIL Ludovic Stephan EMAIL Information, Learning and Physics (Ide PHICS) Laboratory Ecole Polytechnique F ed erale de Lausanne Route Cantonale, 1015 Lausanne, Switzerland D epartement d Informatique Ecole Normale Sup erieure PSL & CNRS 45 rue d Ulm, F-75230 Paris cedex 05, France Univ Rennes, Ensai, CNRS, CREST UMR 9194 F-35000 Rennes, France Statistical Physics Of Computation (SPOC) Laboratory Ecole Polytechnique F ed erale de Lausanne Route Cantonale, 1015 Lausanne, Switzerland
Pseudocode	Yes	Appendix A. Numerical investigation... Description of training algorithm and hyperparameters: First, we describe the training protocol reported in Alg. 1: we separately update the ﬁrst layer with T GD steps of learning rate η, followed by training with standard ridge regression for the second layer with ﬁxed regularization strength λ. ... Algorithm 1 Training procedure
Open Source Code	Yes	The code to reproduce our ﬁgures is available on Git Hub, and we refer to App. A for details on the numerical implementations.
Open Datasets	No	In this work, we focus on a popular synthetic data model consisting of: a) independently drawn standard Gaussian covariates zν N(0, Id); b) a target function yν = f (zν) depending only on a ﬁnite number of relevant directions, also known as a multi-index model.
Dataset Splits	No	for every gradient step t T, a fresh batch of training data {(zν, yν)}n ν=1 is drawn from the model in Assumption 1, and the ﬁrst layer weights are updated according to: ... (ii) Second layer training: once the ﬁrst layer is trained for T steps, the second layer weights a are trained to optimality on an independent batch of data by performing ridge regression with the features learned in the ﬁrst step: ... We note that the paper uses synthetically generated data in batches rather than predefined train/test/validation splits from a fixed dataset.
Hardware Specification	No	The paper mentions parameters like 'd = 28', 'd = 512', 'p = 256', 'p = 1024', which refer to data dimensions or model architecture, not specific hardware components. No specific hardware (GPU, CPU models, memory, etc.) is mentioned.
Software Dependencies	No	The paper does not explicitly list any software dependencies with version numbers, such as programming languages or libraries.
Experiment Setup	Yes	Description of training algorithm and hyperparameters: First, we describe the training protocol reported in Alg. 1: we separately update the ﬁrst layer with T GD steps of learning rate η, followed by training with standard ridge regression for the second layer with ﬁxed regularization strength λ. We vary adaptively the learning rate to satisfy the hypothesis of Thm. 5, i.e. η = O(pp n d), and we take noiseless labels. If not stated otherwise, we consider ﬁxed regularization strenghth λ = 1. We average over 10 diﬀerent seeds to get the mean performance, and we use standard deviation for giving conﬁdence intervals.