reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gradient Methods Never Overfit On Separable Data

Authors: Ohad Shamir

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we formally show that standard gradient methods (in particular, gradient ﬂow, gradient descent and stochastic gradient descent) never overﬁt on separable data... We conclude our paper by providing a numerical experiment supporting our theoretical results on a real-world dataset.
Researcher Affiliation	Academia	Ohad Shamir EMAIL Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel
Pseudocode	No	The paper describes methods using mathematical equations and text, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link regarding the availability of source code for the described methodology.
Open Datasets	Yes	Speciﬁcally, we used a linearly-separable subset of the well-known MNIST digit recognition dataset, focusing on the binary classiﬁcation task of distinguishing between the letters 4 and 7 using a linear predictor.
Dataset Splits	Yes	Given this linearly-separable dataset, we randomly split it into a training set of size 10^4, and a test set.
Hardware Specification	No	The paper describes a numerical experiment but does not provide specific details about the hardware used to run these experiments.
Software Dependencies	No	The paper mentions using gradient descent with logistic loss and a quadratic programming solver but does not specify any software versions for reproducibility.
Experiment Setup	Yes	Then, we trained an unbiased linear classiﬁer using gradient descent (with step size 1 and using the logistic loss) until all but 212 examples were misclassiﬁed. After removing these examples, we were left with a dataset of 13905 examples, which by deﬁnition is linearly separable. Using a quadratic programming solver, we computed the exact max-margin predictor w , with a margin of γ := 0.0187... Given this linearly-separable dataset, we randomly split it into a training set of size 10^4, and a test set. We again ran gradient descent (with the logistic loss and step size of 1) on the training set.