Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

Authors: Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory. Keywords: Stochastic Gradient Descent, Implicit Bias, Re LU Activation, Overparameterized Models, Mean-Field ... We validate our findings with numerical simulations for different regression tasks in Section 7.
Researcher Affiliation Academia Alexander Shevchenko EMAIL Institute of Science and Technology Austria 3400 Klosterneuburg, Austria; Vyacheslav Kungurtsev EMAIL Department of Computer Science Czech Technical University in Prague 166 36 Prague, Czechia; Marco Mondelli EMAIL Institute of Science and Technology Austria 3400 Klosterneuburg, Austria
Pseudocode No The paper describes theoretical models and mathematical proofs, detailing the SGD update rule in equation (3.3) but does not present it or any other procedure in a structured pseudocode or algorithm block format.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories. The license information provided refers to the paper itself, not accompanying code.
Open Datasets No We consider the following dataset which consists of two points: D = {(−x, y), (x, y)} = {(−10, 2), (10, 2)} (Section 6.1). The results for two different unidimensional datasets are reported in Figures 7 and 8 (Section 7). The datasets used are custom, small, and defined directly within the text or visually represented in figures; no external, publicly available datasets are referenced with access information.
Dataset Splits No The paper utilizes custom, small datasets (e.g., two data points as described in Section 6.1, or other unidimensional datasets shown in figures). These simple datasets do not involve conventional training/testing/validation splits, and no information about such splits is provided.
Hardware Specification No The paper mentions running numerical simulations and experiments in Section 7, but it does not specify any hardware details such as GPU models, CPU types, or other computing infrastructure used for these simulations.
Software Dependencies No The paper describes the use of the SGD iteration (3.3) for training, but it does not specify any software dependencies by name or version number (e.g., Python, TensorFlow, PyTorch, CUDA versions).
Experiment Setup Yes We run the SGD iteration (3.3) (no momentum or weight decay, batch size equal to 1), and we plot the resulting predictor once the algorithm has converged. The learning rate is sk = 1, the total number of training epochs required for SGD to converge is roughly 5 * 10^4, and no ℓ2 regularization is enforced (λ = 0). As predicted by our theoretical findings, the predictor approaches a piecewise linear function whose number of tangent changes (or knots) is proportional to the number of training samples (and not to the width of the network): if β^-1 = 0.005, the predictor is still rather smooth; if β^-1 = 10^-4, the predictor sharpens, except for a smoother tangent change in the interval [4, 5]; and finally if β^-1 = 0, the predictor is piecewise linear. (Section 7)