Beyond Sub-Gaussian Noises: Sharp Concentration Analysis for Stochastic Gradient Descent
Authors: Wanrong Zhu, Zhipeng Lou, Wei Biao Wu
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a numerical study of the accuracy of the exact tail probability in (14) for ν = 3. The true tail probability of the estimation error (LHS of (14)) can be calculated through the inversion formula. ... In figure 2, we report the ratios 1 Φ(x/ µT,2) /P(ST x), R(T, x)/P(ST x) and 1 Φ(x/ µT,2) + R(T, x) /P(ST x). We can see that the Gaussian approximation is good for small deviations, while the tail approximation is better when the deviation is moderate or large. The numerical study confirms that the polynomial term in the upper bound (11) is necessary in the case of heavy-tailed gradient noise, especially for moderate and large deviations. |
| Researcher Affiliation | Academia | Zhipeng Lou EMAIL Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA Wanrong Zhu EMAIL Wei Biao Wu EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA |
| Pseudocode | No | The paper provides mathematical equations for the SGD update rule (e.g., 'θt = θt 1 ηtbgt(θt 1), t 1,'), but it does not include any clearly labeled pseudocode blocks, algorithms, or structured code-like procedures. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide any links to code repositories in the main text or supplementary materials. |
| Open Datasets | No | The paper describes using a 'linear regression model' and a 'mean estimation model' where data is 'i.i.d. generated from a t-distribution with degree of freedom ν > 2'. It does not reference or provide concrete access information for any specific named public datasets. |
| Dataset Splits | No | The paper primarily deals with theoretical analysis and numerical studies using data generated from models. As such, it does not involve standard empirical datasets with explicit training/test/validation splits. |
| Hardware Specification | No | The paper includes a 'numerical study' in Section 4.3, but it does not provide any specific details about the hardware used for these computations, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific details about any ancillary software dependencies, including names of libraries, frameworks, or programming languages with their corresponding version numbers, used for the research or numerical studies. |
| Experiment Setup | Yes | We focus on the polynomial decay step size regime, i.e., ηt = η0t α, with η0 = 0.1, α = 0.55 in the rest of this section. We conduct a numerical study of the accuracy of the exact tail probability in (14) for ν = 3. |