reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

Authors: Jingfeng Wu, Peter Bartlett, Matus Telgarsky, Bin Yu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1. The logistic risk and zero-one error along the GD path for an overparameterized logistic regression problem. Here d = 2000, n = 1000, λi = i 2, w 0:100 = 1 and w 100: = 0. The optimization length is measured by ηt. The plots show that the excess logistic risk and excess zero-one error are both small for GD with appropriate early stopping, and both grow larger when GD enters the interpolation regime. These demonstrate the regularization of early stopping in GD.
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Google Deep Mind 3New York University. Correspondence to: Jingfeng Wu <EMAIL>, Peter L. Bartlett <EMAIL>, Matus Telgarsky <EMAIL>, Bin Yu <EMAIL>.
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided. The gradient descent steps are described using mathematical equations rather than structured algorithmic formatting, for example: 'w0 = 0, wt+1 = wt η b L(wt), t 0, (GD)'.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	No	We focus on a well-specified setting where the feature vector follows an anisotropic Gaussian design and the binary label conditional on the feature is given by a logistic model (see Assumption 1 in Section 2). This describes a synthetic data generation process rather than referencing an existing publicly available dataset with concrete access information.
Dataset Splits	No	Let (xi, yi)n i=1 be n independent copies of (x, y). Define the empirical risk as b L(w) := 1/n sum(ℓ(yix i w), i=1 to n), w H. The paper describes data generation and a sample size 'n', but does not specify any training/test/validation splits for experimental reproduction.
Hardware Specification	No	The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It refers to general methods like 'gradient descent' and 'logistic regression' but not specific libraries or frameworks.
Experiment Setup	No	The iterates of gradient descent (GD) are given by w0 = 0, wt+1 = wt η b L(wt), t 0, (GD) where η > 0 is a fixed stepsize. While a stepsize η is mentioned, specific values or ranges for experimental hyperparameters like learning rate, batch size, epochs, or optimizer settings are not provided. The parameters mentioned in Figure 1 caption (d = 2000, n = 1000, λi = i 2, w 0:100 = 1 and w 100: = 0) relate to the data generation model, not the experimental setup.