reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Early Alignment in Two-Layer Networks Training is a Two-Edged Sword

Authors: Etienne Boursier, Nicolas Flammarion

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section empirically illustrates the results of Theorems 1 and 2. The considered dataset does not exactly ﬁt the conditions of Theorem 2 to illustrate that Assumption 3 with η < 1/6 is only needed for analytical purposes. The dataset is however similar to datasets satisfying Assumption 3 (see e.g., Figure 1) in the sense that all three data points are positively correlated, with positive labels; and the middle point is below the optimal linear regressor. [...] Figure 2 illustrates the training dynamics over time.
Researcher Affiliation	Academia	Etienne Boursier EMAIL Universit e Paris-Saclay, CNRS, Inria, Laboratoire de math ematiques d Orsay, 91405, Orsay, France; Nicolas Flammarion EMAIL TML Lab, EPFL, Switzerland
Pseudocode	No	The paper describes methods and theoretical analysis through mathematical equations and proofs. It mentions a 'proof sketch' but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	The code and animated versions of the ﬁgures are also available at github.com/eboursier/early_alignment.
Open Datasets	No	We consider the following 3 points data example (n = 3 in this section). Assumption 3. The data is given by 3 points (xk, yk) R3, for some η > 0, x1 ( 1, 1 + η] [1, 1 + η] and y1 [1, 1 + η]; x2 [ η, η] [1 η, 1 + η] and y2 (0, η]; x3 [1 η, 1) [1, 1 + η] and y3 [1, 1 + η]. [...] In Section 5, we considered the following univariate 3 points dataset: x1 = 0.75 and y1 = 1.1; x2 = 0.5 and y2 = 0.1; x3 = 0.125 and y3 = 0.8. [...] Precisely, we consider 40 univariate data points xi sampled uniformly at random in [ 1, 1].
Dataset Splits	No	The paper uses custom-generated simple data examples to illustrate theoretical results and training dynamics. It does not perform evaluations requiring standard training, validation, or test splits, and therefore, no such split information is provided.
Hardware Specification	No	The paper does not provide specific hardware details (like exact GPU models, CPU types, or memory configurations) used for running its experiments.
Software Dependencies	No	The paper mentions 'Re LU network with gradient descent' and 'trained with m = 200 000 neurons' using a 'learning rate 10^-3', but it does not specify any particular software libraries, frameworks, or their version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	The activation function is Re LU, the initialisation follows Equation (3) with λ = 10^-3 and wj N(0, I2), aj = sj wj with sj U({ -1, 1}). [...] Lastly, the neural network is trained with m = 200 000 neurons to approximate the inﬁnite width regime. We ran gradient descent with learning rate 10^-3 up to 2 millions of iterations.