Rapid Overfitting of Multi-Pass SGD in Stochastic Convex Optimization
Authors: Shira Vansover-Hager, Tomer Koren, Roi Livni
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish a tight bound of Ξ(1/(ππ) + π π) on the population excess risk of multi-pass (without replacement) SGD with stepsize πover πsteps (see Theorems 3.1 and 3.3 in Section 3). This result applies to both the single-shuffle and multi-shuffle variants, but also more generally to any multi-pass scheme that processes examples according to an arbitrary sequence of permutations. We also prove a similar tight Ξ(1/(ππ)+π π) population excess risk bound for with-replacement SGD, that holds after π(πlog π) steps (see Theorems 3.2 and 3.4). |
| Researcher Affiliation | Collaboration | 1Blavatnik School of Computer Science, Tel Aviv University 2Google Research 3School of Electrical and Computer Engineering, Tel Aviv University. |
| Pseudocode | Yes | One-pass Stochastic Gradient Descent (SGD). One Pass SGD receives a training set π= {π§1, . . . , π§π}, a step size π> 0, a suffix averaging parameter π [π+ 1] and a first-order oracle ππ§(π€) π(π€, π§) and proceeds as: Initialize: π€0 = 0 For π‘= 1, . . . , π: π€π‘+1 = Ξ π π€π‘ πππ§π‘(π€π‘) Output: bπ€π,π= 1 π‘=π π+1 π€π‘, |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code, links to code repositories, or mention of code in supplementary materials. |
| Open Datasets | No | We study outof-sample performance of multi-pass SGD in the context of the fundamental stochastic convex optimization (SCO) framework. A learning problem in SCO is defined by a convex and πΊ-Lipschitz loss function π: π π β, where πis a sample space endowed with a population distribution Z, and πis a convex, compact domain with diameter bounded by π·. |
| Dataset Splits | No | The paper focuses on theoretical analysis and does not present empirical experiments using specific datasets, thus no dataset splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not mention any software libraries, frameworks, or specific version numbers used for implementation or experiments. |
| Experiment Setup | No | The paper focuses on theoretical contributions to stochastic convex optimization and does not include details on experimental setup, hyperparameters, or training configurations. |