Rapid Overfitting of Multi-Pass SGD in Stochastic Convex Optimization

Authors: Shira Vansover-Hager, Tomer Koren, Roi Livni

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish a tight bound of Θ(1/(πœ‚π‘‡) + πœ‚ 𝑇) on the population excess risk of multi-pass (without replacement) SGD with stepsize πœ‚over 𝑇steps (see Theorems 3.1 and 3.3 in Section 3). This result applies to both the single-shuffle and multi-shuffle variants, but also more generally to any multi-pass scheme that processes examples according to an arbitrary sequence of permutations. We also prove a similar tight Θ(1/(πœ‚π‘‡)+πœ‚ 𝑇) population excess risk bound for with-replacement SGD, that holds after 𝑂(𝑛log 𝑛) steps (see Theorems 3.2 and 3.4).
Researcher Affiliation Collaboration 1Blavatnik School of Computer Science, Tel Aviv University 2Google Research 3School of Electrical and Computer Engineering, Tel Aviv University.
Pseudocode Yes One-pass Stochastic Gradient Descent (SGD). One Pass SGD receives a training set 𝑆= {𝑧1, . . . , 𝑧𝑛}, a step size πœ‚> 0, a suffix averaging parameter 𝜏 [𝑛+ 1] and a first-order oracle 𝑂𝑧(𝑀) 𝑓(𝑀, 𝑧) and proceeds as: Initialize: 𝑀0 = 0 For 𝑑= 1, . . . , 𝑛: 𝑀𝑑+1 = Ξ π‘Š 𝑀𝑑 πœ‚π‘‚π‘§π‘‘(𝑀𝑑) Output: b𝑀𝑛,𝜏= 1 𝑑=𝑛 𝜏+1 𝑀𝑑,
Open Source Code No The paper does not contain any explicit statements about open-sourcing code, links to code repositories, or mention of code in supplementary materials.
Open Datasets No We study outof-sample performance of multi-pass SGD in the context of the fundamental stochastic convex optimization (SCO) framework. A learning problem in SCO is defined by a convex and 𝐺-Lipschitz loss function 𝑓: π‘Š 𝑍 ℝ, where 𝑍is a sample space endowed with a population distribution Z, and π‘Šis a convex, compact domain with diameter bounded by 𝐷.
Dataset Splits No The paper focuses on theoretical analysis and does not present empirical experiments using specific datasets, thus no dataset splits are mentioned.
Hardware Specification No The paper is theoretical and does not describe any experimental setup, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not mention any software libraries, frameworks, or specific version numbers used for implementation or experiments.
Experiment Setup No The paper focuses on theoretical contributions to stochastic convex optimization and does not include details on experimental setup, hyperparameters, or training configurations.