reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra

Authors: Roman Worschech, Bernd Rosenow

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 2: Generatlization error ̈g for linear activation function. Left: ̈g evaluated using Eq. (8) (blue) and Eq. (6) (orange) for N = 128, K = M = 1, ̃2 J = 1, = 1, and = 1. Right: ̈g evaluated using Eq. (9) (dashed orange) compared to simulations experiments averaged over 15 random initializations (solid blue), with N = L = 1024, = 0.75, = 0.01, and ̃J = 0.01. Figure 3: Generalization error ̈g for different trainable input dimensions Nl of the student network. Left: ̈g as a function of for various Nl, with L = N = 256, K = M = 1, ̃J = 0.01, = 0.05, and = 1. The student network is trained on synthetic data and the teacher s outputs. Right: ̈g as a function of , with L = N = 1024, K = M = 1, ̃J = 0.01, and = 0.05. The student network is trained on the CIFAR-5m dataset Nakkiran et al. (2021) using the teacher s outputs. We estimate the scaling exponent 0.3 for this dataset. For the theoretical predictions, the empirical data spectrum is used to evaluate Eq. (11). Both plots compare the simulation results (solid curves) to the theoretical prediction from Eq. (11) (black dashed lines). Figure 6: Scaling behavior of the generalization error ̈g in the asymptotic regime for a non-linear activation function. Left: ̈g as a function of for K = M = 40, = 0.01, ̃J = 10 6 and L = N = 512 for simulations averaged over 10 different initializations.
Researcher Affiliation	Academia	Roman Worschech1,2 Bernd Rosenow1 1Institut fr Theoretische Physik, Universitt Leipzig, Brderstrae 16, 04103 Leipzig, Germany 2Max Planck Institute for Mathematics in the Sciences, Inselstrae 22, 04103 Leipzig, Germany
Pseudocode	No	The paper presents mathematical derivations, equations, and describes methods in text, but there are no explicitly labeled "Pseudocode" or "Algorithm" blocks, nor are there structured steps formatted like code.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code, a link to a code repository, or mention of code being available in supplementary materials for the methodology described in this paper.
Open Datasets	Yes	The student network is trained on the CIFAR-5m dataset Nakkiran et al. (2021) using the teacher s outputs.
Dataset Splits	No	The paper does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits with specific details) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or detailed computer specifications used for running the simulations or experiments.
Software Dependencies	No	In Appendix G, we utilized Julia, a high-level scripting language, with arbitrary precision arithmetic. The text mentions "Julia" but does not specify its version number, nor does it list other key software components with versions (e.g., libraries or specific solvers).
Experiment Setup	Yes	Figure 2: Generatlization error ̈g for linear activation function. Left: ̈g evaluated using Eq. (8) (blue) and Eq. (6) (orange) for N = 128, K = M = 1, ̃2 J = 1, = 1, and = 1. Right: ̈g evaluated using Eq. (9) (dashed orange) compared to simulations experiments averaged over 15 random initializations (solid blue), with N = L = 1024, = 0.75, = 0.01, and ̃J = 0.01. Figure 4: Symmetric plateau for a non-linear activation function. Left and center: Plateau behavior of the order parameters for L = 10, N = 7000, ̃J = 0.01, = 0.1, and M = K = 4, using one random initialization of the student and teacher vectors.