reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Implicit Bias of Gradient Descent on Separable Data

Authors: Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A numerical illustration of the convergence is depicted in Figure 1. As predicted by the theory, the norm w(t) grows logarithmically (note the semi-log scaling), and w(t) converges to the max-margin separator, but only logarithmically, while the loss itself decreases very rapidly (note the log-log scaling). An important practical consequence of our theory, is that although the margin of w(t) keeps improving, and so we can expect the population (or test) misclassiﬁcation error of w(t) to improve for many datasets, the same cannot be said about the expected population loss (or test loss)!... These effects are demonstrated in Figure 2 and Table 1 which portray typical training of a convolutional neural network using unregularized gradient descent4.
Researcher Affiliation	Academia	Department of Electrical Engineering,Technion Haifa, 320003, Israel Toyota Technological Institute at Chicago Chicago, Illinois 60637, USA
Pseudocode	No	The paper describes mathematical proofs, theorems, and derivations. It does not include a clearly labeled "Pseudocode" or "Algorithm" block, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Code available here: https://github.com/paper-submissions/Max Margin
Open Datasets	Yes	Training of a convolutional neural network on CIFAR10 using stochastic gradient descent with constant learning rate and momentum, softmax output and a cross entropy loss... and Visualization of or main results on a synthetic dataset in which the L2 max margin vector ˆw is precisely known.
Dataset Splits	No	Figure 2: Training of a convolutional neural network on CIFAR10 using stochastic gradient descent with constant learning rate and momentum, softmax output and a cross entropy loss, where we achieve 8.3% ﬁnal validation error. Table 1: Sample values from various epochs in the experiment depicted in Fig. 2. ... Validation loss ... Validation error. The paper mentions the use of a validation set for CIFAR10 experiments but does not provide specific details on how the dataset was split (e.g., percentages or exact numbers of samples for training, validation, or test sets).
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only generally refers to training models.
Software Dependencies	No	The paper mentions optimization methods like "stochastic gradient descent" and "ADAM (Kingma and Ba, 2015)", but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	Implementation details: The dataset includes four support vectors: x1 = (0.5, 1.5) , x2 = (1.5, 0.5) with y1 = y2 = 1, and x3 = x1, x4 = x2 with y3 = y4 = 1 (the L2 normalized max margin vector is then ˆw = (1, 1) / 2 with margin equal to 2 ), and 12 other random datapoints (6 from each class), that are not on the margin. We used a learning rate η = 1/σ2 max (X), where σ2 max (X) is the maximal singular value of X, momentum γ = 0.9 for GDMO, and initialized at the origin. and Training of a convolutional neural network on CIFAR10 using stochastic gradient descent with constant learning rate and momentum, softmax output and a cross entropy loss...