Beyond Sub-Gaussian Noises: Sharp Concentration Analysis for Stochastic Gradient Descent

Authors: Wanrong Zhu, Zhipeng Lou, Wei Biao Wu

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a numerical study of the accuracy of the exact tail probability in (14) for ν = 3. The true tail probability of the estimation error (LHS of (14)) can be calculated through the inversion formula. ... In figure 2, we report the ratios 1 Φ(x/ µT,2) /P(ST x), R(T, x)/P(ST x) and 1 Φ(x/ µT,2) + R(T, x) /P(ST x). We can see that the Gaussian approximation is good for small deviations, while the tail approximation is better when the deviation is moderate or large. The numerical study confirms that the polynomial term in the upper bound (11) is necessary in the case of heavy-tailed gradient noise, especially for moderate and large deviations.
Researcher Affiliation Academia Zhipeng Lou EMAIL Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA Wanrong Zhu EMAIL Wei Biao Wu EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA
Pseudocode No The paper provides mathematical equations for the SGD update rule (e.g., 'θt = θt 1 ηtbgt(θt 1), t 1,'), but it does not include any clearly labeled pseudocode blocks, algorithms, or structured code-like procedures.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide any links to code repositories in the main text or supplementary materials.
Open Datasets No The paper describes using a 'linear regression model' and a 'mean estimation model' where data is 'i.i.d. generated from a t-distribution with degree of freedom ν > 2'. It does not reference or provide concrete access information for any specific named public datasets.
Dataset Splits No The paper primarily deals with theoretical analysis and numerical studies using data generated from models. As such, it does not involve standard empirical datasets with explicit training/test/validation splits.
Hardware Specification No The paper includes a 'numerical study' in Section 4.3, but it does not provide any specific details about the hardware used for these computations, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not provide specific details about any ancillary software dependencies, including names of libraries, frameworks, or programming languages with their corresponding version numbers, used for the research or numerical studies.
Experiment Setup Yes We focus on the polynomial decay step size regime, i.e., ηt = η0t α, with η0 = 0.1, α = 0.55 in the rest of this section. We conduct a numerical study of the accuracy of the exact tail probability in (14) for ν = 3.