Robust Stochastic Optimization via Gradient Quantile Clipping
Authors: Ibrahim Merad, Stéphane Gaïffas
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an implementation of this algorithm using rolling quantiles which leads to a highly efficient optimization procedure with strong robustness properties, as confirmed by our numerical experiments. Finally, we provide experiments to demonstrate that QC-SGD can be easily and efficiently implemented by estimating Qp( e G(θt, ζt) ) with rolling quantiles. In particular, we show that the iteration is indeed robust to heavy tails and corruption on multiple stochastic optimization tasks. |
| Researcher Affiliation | Academia | Ibrahim Merad EMAIL LPSM, UMR 8001 Université Paris Cité, Paris, France Stéphane Gaïffas EMAIL LPSM, UMR 8001 Université Paris Cité, Paris, France DMA, École normale supérieure |
| Pseudocode | Yes | Algorithm 1: Aggregation of cycling iterates; Algorithm 2: Rolling QC-SGD |
| Open Source Code | No | The paper does not provide an explicit statement or link for the source code of its own methodology. It only mentions: "We do not include a comparison with (Diakonikolas et al., 2022) whose procedure has no implementation we are aware of and is difficult to use in practice." |
| Open Datasets | Yes | Dataset for Sensorless Drive Diagnosis. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C5VP5F. Jock Blackard. Covertype. UCI Machine Learning Repository, 1998. DOI: https://doi.org/10.24432/C50K5N. Abdelhakim Hannousse and Salima Yahiouche. Web page phishing detection. Mendeley Data, 2, 2020. Byron Roe. Mini Boo NE particle identification. UCI Machine Learning Repository, 2010. DOI: https://doi.org/10.24432/C5QC87. Codrna (Uzilov et al., 2006) 488,565 8 2 Open ML |
| Dataset Splits | Yes | We use a 10% share of each dataset as a test set in order to compute the test loss plotted in Figures 2 and 3. We also ensure the test set contains at least 5000 elements. Optimization is run using the remaining train set which is corrupted as specified next. |
| Hardware Specification | No | The paper describes experimental results on synthetic and real datasets but does not provide any specific details about the hardware used for these experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers used for the experiments. It describes the algorithms and their implementation conceptually but lacks details on the programming languages, libraries, or frameworks with their versions. |
| Experiment Setup | Yes | Our experiments on synthetic data consider an infinite horizon, dimension d = 128, and a constant step size for all methods. We use step size β = 10-3. We use step size β = 6 10-3. We use one sample per iteration and step size β = 10-2 for all methods. As previously, RQC-SGD is run with buffer size S = 100 and τunif = 10. The quantile value was set to p = 0.9. We compute the gradient norms over a batch of samples of size S at the beginning of the optimization and use the quantiles of order p = 0.25, 0.5 and 0.75 as the clipping level for the constant clipping baselines. |