Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
Authors: Jingfeng Wu, Peter Bartlett, Matus Telgarsky, Bin Yu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1. The logistic risk and zero-one error along the GD path for an overparameterized logistic regression problem. Here d = 2000, n = 1000, λi = i 2, w 0:100 = 1 and w 100: = 0. The optimization length is measured by ηt. The plots show that the excess logistic risk and excess zero-one error are both small for GD with appropriate early stopping, and both grow larger when GD enters the interpolation regime. These demonstrate the regularization of early stopping in GD. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Google Deep Mind 3New York University. Correspondence to: Jingfeng Wu <EMAIL>, Peter L. Bartlett <EMAIL>, Matus Telgarsky <EMAIL>, Bin Yu <EMAIL>. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided. The gradient descent steps are described using mathematical equations rather than structured algorithmic formatting, for example: 'w0 = 0, wt+1 = wt η b L(wt), t 0, (GD)'. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | No | We focus on a well-specified setting where the feature vector follows an anisotropic Gaussian design and the binary label conditional on the feature is given by a logistic model (see Assumption 1 in Section 2). This describes a synthetic data generation process rather than referencing an existing publicly available dataset with concrete access information. |
| Dataset Splits | No | Let (xi, yi)n i=1 be n independent copies of (x, y). Define the empirical risk as b L(w) := 1/n sum(ℓ(yix i w), i=1 to n), w H. The paper describes data generation and a sample size 'n', but does not specify any training/test/validation splits for experimental reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. It refers to general methods like 'gradient descent' and 'logistic regression' but not specific libraries or frameworks. |
| Experiment Setup | No | The iterates of gradient descent (GD) are given by w0 = 0, wt+1 = wt η b L(wt), t 0, (GD) where η > 0 is a fixed stepsize. While a stepsize η is mentioned, specific values or ranges for experimental hyperparameters like learning rate, batch size, epochs, or optimizer settings are not provided. The parameters mentioned in Figure 1 caption (d = 2000, n = 1000, λi = i 2, w 0:100 = 1 and w 100: = 0) relate to the data generation model, not the experimental setup. |