reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks

Authors: Etienne Boursier, Nicolas Flammarion

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work illustrates on a simple linear example the phenomenon of non-convergence of the parameters towards a global minimum of the training loss, despite overparametrization. This non-convergence actually yields a simplicity bias on the final estimator, which can lead to an optimal fit of the true data distribution. A similar phenomenon has been observed on more complex and realistic settings (Yoon et al., 2023; Kadkhodaie et al., 2024; Raventós et al., 2024). However, a theoretical analysis remains out of reach in these cases. It is still unclear whether the observed non-convergence arises from the early alignment mechanism proposed in our work, from stability issues as suggested by Qiao et al. (2024), from other factors, or from a combination of these effects.Our result is proven via the description of the early alignment phase. Besides the specific data example considered in Section 4, we also provide concentration bounds on the extremal vectors driving this early alignment. We believe these bounds (Theorem 3.1) can be used in subsequent works to better understand this early phase of the training dynamics, and how it yields biases towards simple estimators.
Researcher Affiliation	Academia	INRIA, LMO, Université Paris-Saclay, Orsay, France 2TML Lab, EPFL, Switzerland. Correspondence to: Etienne Boursier <EMAIL>.
Pseudocode	No	The paper includes mathematical derivations and descriptions of algorithms (like gradient flow) but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	All the experiments were run on a personal Mac Book Pro, for a total compute time of approximately 100 hours. The code can be found at github.com/eboursier/simplicity_bias.
Open Datasets	No	The paper uses synthetic data generated according to a linear model: "yk = β xk + ηk, where ηk are drawn i.i.d. as centered Gaussian of variance σ2 = 0.09, xk are drawn i.i.d. as centered Gaussian variables and β is fixed, without loss of generality, to β = (1, 0, . . . , 0)." This data is generated by the authors, and no public access or repository is provided for specific instances of the generated datasets.
Dataset Splits	No	The paper describes generating training samples and evaluating train and test losses, but does not explicitly define fixed training, validation, and test splits from a pre-existing dataset. It varies the 'number of training samples' (n) but generates new data for each run.
Hardware Specification	Yes	All the experiments were run on a personal Mac Book Pro, for a total compute time of approximately 100 hours.
Software Dependencies	No	The paper mentions "pytorch default hyperparameters" in Appendix A.3 but does not specify a version number for PyTorch or any other software library.
Experiment Setup	Yes	The neural networks are trained via stochastic gradient descent (SGD), with batch size 32 and learning rate 0.01. To ensure that we reached convergence of the parameters, we train the networks for 8 × 10^6 iterations of SGD, where the training seems stabilized.