reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep linear networks can benignly overfit when shallow ones do

Authors: Niladri S. Chatterji, Philip M. Long

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our simulations verify that aspects of our bounds reﬂect typical behavior for simple data distributions. We also ﬁnd that similar phenomena are seen in simulations with Re LU networks, although the situation there is more nuanced. Figure 1 contains plots from simulation experiments where the excess risk of a deep linear model increases with the scale of the initialization of the ﬁrst layer, as in the upper bounds of our analysis. Inspired by our theory, we ran simulations to study the excess risk of several linear networks and Re LU networks as a function of both the initialization scale and dimension.
Researcher Affiliation	Collaboration	Niladri S. Chatterji EMAIL Computer Science Department, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305. Philip M. Long EMAIL Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043.
Pseudocode	No	The paper describes theoretical proofs and simulation results, but no pseudocode or algorithm blocks are explicitly presented.
Open Source Code	Yes	1. Code at https://github.com/niladri-chatterji/Benign-Deep-Linear
Open Datasets	No	The generative model for the underlying data was y = xΘ + ω, where x N(0, Σ) and ω N(0, 1)... the generative model for the underlying data was y = f (x) + ω, where x N(0, I10 10) and ω N(0, 1). The paper describes synthetic data generation process rather than using or providing a publicly available dataset with concrete access information.
Dataset Splits	No	The model is trained on n = 100 points drawn from the generative model y = xΘ + ω... The networks are trained on n = 500 samples... The excess risk is deﬁned as Ex,y y xΘ 2 y xΘ 2 , where x, y are test samples that are independent of Θ. While the paper mentions the number of training samples and evaluation on independent test samples, it does not specify explicit dataset split percentages, counts, or a detailed splitting methodology for reproducibility.
Hardware Specification	No	The paper describes simulation setups and training procedures, but does not provide any specific details about the hardware used to run these experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper mentions training models with 'full-batch gradient descent' and 'Re LU networks,' but does not provide specific software names or version numbers (e.g., Python, PyTorch, TensorFlow versions) used for implementation.
Experiment Setup	Yes	All of the models are trained on the squared loss with full-batch gradient descent with step-size 10^-4, until the training loss is smaller than 10^-7. We train models that have 2 hidden layers (L = 3). The width of the middle layers m is set to be 10(d + q), where d is the input dimension and q is the output dimension. ... The width of the middle layers (m) is set to be 50.