reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

Authors: Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory. Keywords: Stochastic Gradient Descent, Implicit Bias, Re LU Activation, Overparameterized Models, Mean-Field ... We validate our ﬁndings with numerical simulations for diﬀerent regression tasks in Section 7.
Researcher Affiliation	Academia	Alexander Shevchenko EMAIL Institute of Science and Technology Austria 3400 Klosterneuburg, Austria; Vyacheslav Kungurtsev EMAIL Department of Computer Science Czech Technical University in Prague 166 36 Prague, Czechia; Marco Mondelli EMAIL Institute of Science and Technology Austria 3400 Klosterneuburg, Austria
Pseudocode	No	The paper describes theoretical models and mathematical proofs, detailing the SGD update rule in equation (3.3) but does not present it or any other procedure in a structured pseudocode or algorithm block format.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories. The license information provided refers to the paper itself, not accompanying code.
Open Datasets	No	We consider the following dataset which consists of two points: D = {(−x, y), (x, y)} = {(−10, 2), (10, 2)} (Section 6.1). The results for two different unidimensional datasets are reported in Figures 7 and 8 (Section 7). The datasets used are custom, small, and defined directly within the text or visually represented in figures; no external, publicly available datasets are referenced with access information.
Dataset Splits	No	The paper utilizes custom, small datasets (e.g., two data points as described in Section 6.1, or other unidimensional datasets shown in figures). These simple datasets do not involve conventional training/testing/validation splits, and no information about such splits is provided.
Hardware Specification	No	The paper mentions running numerical simulations and experiments in Section 7, but it does not specify any hardware details such as GPU models, CPU types, or other computing infrastructure used for these simulations.
Software Dependencies	No	The paper describes the use of the SGD iteration (3.3) for training, but it does not specify any software dependencies by name or version number (e.g., Python, TensorFlow, PyTorch, CUDA versions).
Experiment Setup	Yes	We run the SGD iteration (3.3) (no momentum or weight decay, batch size equal to 1), and we plot the resulting predictor once the algorithm has converged. The learning rate is sk = 1, the total number of training epochs required for SGD to converge is roughly 5 * 10^4, and no ℓ2 regularization is enforced (λ = 0). As predicted by our theoretical ﬁndings, the predictor approaches a piecewise linear function whose number of tangent changes (or knots) is proportional to the number of training samples (and not to the width of the network): if β^-1 = 0.005, the predictor is still rather smooth; if β^-1 = 10^-4, the predictor sharpens, except for a smoother tangent change in the interval [4, 5]; and ﬁnally if β^-1 = 0, the predictor is piecewise linear. (Section 7)