Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Authors: Zijian Liu, Zhengyuan Zhou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, by revisiting the existing Batched Normalized Stochastic Gradient Descent with Momentum (Batched NSGDM) algorithm, we provide the first convergence result under heavy-tailed noises but without gradient clipping. Concretely, we prove that Batched NSGDM can achieve the optimal O(T 1 p 3p 2 ) rate even under the relaxed smooth condition. More interestingly, we also establish the first O(T 2p ) convergence rate in the case where the tail index p is unknown in advance, which is arguably the common scenario in practice. |
| Researcher Affiliation | Academia | Zijian Liu & Zhengyuan Zhou Stern School of Business, New York University EMAIL |
| Pseudocode | Yes | Algorithm 1 Batched Normalized Stochastic Gradient Descent with Momentum (Batched NSGDM) Input: initial point x1 Rd, batch size B N, momentum parameter βt [0, 1], stepsize ηt > 0 for t = 1 to T do gt = 1 B PB i=1 gi t mt = βtmt 1 + (1 βt)gt where m0 g1 xt+1 = xt ηt mt mt where 0 0 0 end for |
| Open Source Code | No | The paper does not contain any explicit statements about open-source code availability, links to repositories, or mentions of code in supplementary materials. |
| Open Datasets | No | This paper is theoretical and does not use any datasets for experiments. Therefore, no information about open datasets is provided. |
| Dataset Splits | No | This paper is theoretical and does not involve experiments with datasets. Consequently, there is no mention of training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on convergence proofs and algorithmic analysis. It does not describe any experiments requiring specific hardware, so no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not detail any implementation or experimental setup. Therefore, no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | This paper is primarily theoretical, focusing on mathematical proofs and convergence rates for an optimization algorithm. It does not describe any practical experimental setup, hyperparameters, or training configurations. |