reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Authors: Zhenfeng Tu, Santiago Tomas Aranguri Diaz, Arthur Jacot

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a surprisingly simple unifying formula for the evolution of the learned matrix that contains as special cases both lazy and balanced regimes but also a mixed regime in between the two. In this paper, we study this transition in the context of linear networks and focus mainly on the effects of the width w and the variance of the weights at initialization σ2, and give a precise and almost complete phase diagram, showing the transitions between lazy and active regimes. Figure 1: For both plots, we train either using gradient descent or the self-consistent dynamics from equation (1), with the scaling γσ2 = 1.85, γw = 2.25 which lies in the active regime. (Left panel): We plot train and test error for both dynamics. Figure 2: As a function of γσ2, γw, we run GD and plot different quantities.
Researcher Affiliation	Academia	Zhenfeng Tu Courant Institute New York University New York, NY 10012 EMAIL Santiago Aranguri Courant Institute New York University New York, NY 10012 EMAIL Arthur Jacot Courant Institute New York University New York, NY 10012 EMAIL
Pseudocode	No	The paper provides mathematical formulas and derivations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	We use synthetic data, with a description of how to build this synthetic data. The experiments are only there for visualization purposes, we see no particular need to publish it.
Open Datasets	No	For all the experiments, we used the losses Ltrain(θ) = 1 d2 Aθ (A + E) 2 F ; Ltest(θ) = 1 d2 Aθ A 2 F where E has i.i.d. N(0, 1) entries, A = K 1/2 PK i=1 uiv T i with ui, vi N(0, Idd) Gaussian vectors in Rd. This means that Rank A = K. The factor K 1/2 ensures that A F = Θ(d).
Dataset Splits	No	The paper mentions 'train and test error' and 'train error converged' but does not specify validation splits or proportions (e.g., 80/10/10 split or specific sample counts for validation).
Hardware Specification	Yes	Experiments took 12 hours of compute, using two Ge Force RTX 2080 Ti (11GB memory) and two TITAN V (12GB memory).
Software Dependencies	No	All the experiments were implemented in Py Torch [40].
Experiment Setup	Yes	For the experiments in Figure 1, we took d = 500 and K = 5. For the experiments in Figure 2, we took d = 200 and K = 5. For making the contour plot, we took a grid with 35 points for γσ2 [ 3.0, 0.0] and 35 points for γw [0, 2.8]. For each of the 352 pair of values for (γσ2, γw), we ran gradient descent (and for the lower right plot the self-consistent dynamics too) until the train error converged. Following Theorem 2, we take a learning rate η = d2 cwσ2 for γσ2 +γ2 > 1, and η = d2 c A op otherwise, where c is usually 50 but can be taken to be 2 or 5 for faster convergence at the cost of more unstable training.