Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD
Authors: Aniket Das, Dheeraj Nagaraj, Soumyabrata Pal, Arun Suggala, Prateek Varshney
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of Catoni and Giulini [8]. |
| Researcher Affiliation | Collaboration | Aniket Das Stanford University EMAIL Dheeraj Nagaraj Google Deep Mind EMAIL Soumyabrata Pal Adobe Research EMAIL Arun Sai Suggala Google Deep Mind EMAIL Prateek Varshney Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Clipped Stochastic Gradient Descent |
| Open Source Code | No | The paper does not provide a statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | No | This is a theoretical paper that focuses on mathematical derivations and analyses of algorithms under theoretical assumptions (e.g., heavy-tailed distributions), rather than empirical evaluation on specific datasets. Therefore, no dataset is mentioned or provided. |
| Dataset Splits | No | This is a theoretical paper and does not conduct experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | This is a theoretical paper and does not specify software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | This is a theoretical paper and does not describe specific experimental setup details such as hyperparameters for empirical runs. It describes theoretical algorithm parameters (e.g., step sizes, clipping levels) as part of the algorithm design, not as an 'experimental setup' for empirical evaluation. |