Robust Stochastic Optimization via Gradient Quantile Clipping

Authors: Ibrahim Merad, Stéphane Gaïffas

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an implementation of this algorithm using rolling quantiles which leads to a highly efficient optimization procedure with strong robustness properties, as confirmed by our numerical experiments. Finally, we provide experiments to demonstrate that QC-SGD can be easily and efficiently implemented by estimating Qp( e G(θt, ζt) ) with rolling quantiles. In particular, we show that the iteration is indeed robust to heavy tails and corruption on multiple stochastic optimization tasks.
Researcher Affiliation Academia Ibrahim Merad EMAIL LPSM, UMR 8001 Université Paris Cité, Paris, France Stéphane Gaïffas EMAIL LPSM, UMR 8001 Université Paris Cité, Paris, France DMA, École normale supérieure
Pseudocode Yes Algorithm 1: Aggregation of cycling iterates; Algorithm 2: Rolling QC-SGD
Open Source Code No The paper does not provide an explicit statement or link for the source code of its own methodology. It only mentions: "We do not include a comparison with (Diakonikolas et al., 2022) whose procedure has no implementation we are aware of and is difficult to use in practice."
Open Datasets Yes Dataset for Sensorless Drive Diagnosis. UCI Machine Learning Repository, 2015. DOI: https://doi.org/10.24432/C5VP5F. Jock Blackard. Covertype. UCI Machine Learning Repository, 1998. DOI: https://doi.org/10.24432/C50K5N. Abdelhakim Hannousse and Salima Yahiouche. Web page phishing detection. Mendeley Data, 2, 2020. Byron Roe. Mini Boo NE particle identification. UCI Machine Learning Repository, 2010. DOI: https://doi.org/10.24432/C5QC87. Codrna (Uzilov et al., 2006) 488,565 8 2 Open ML
Dataset Splits Yes We use a 10% share of each dataset as a test set in order to compute the test loss plotted in Figures 2 and 3. We also ensure the test set contains at least 5000 elements. Optimization is run using the remaining train set which is corrupted as specified next.
Hardware Specification No The paper describes experimental results on synthetic and real datasets but does not provide any specific details about the hardware used for these experiments.
Software Dependencies No The paper does not provide specific software names with version numbers used for the experiments. It describes the algorithms and their implementation conceptually but lacks details on the programming languages, libraries, or frameworks with their versions.
Experiment Setup Yes Our experiments on synthetic data consider an infinite horizon, dimension d = 128, and a constant step size for all methods. We use step size β = 10-3. We use step size β = 6 10-3. We use one sample per iteration and step size β = 10-2 for all methods. As previously, RQC-SGD is run with buffer size S = 100 and τunif = 10. The quantile value was set to p = 0.9. We compute the gradient norms over a batch of samples of size S at the beginning of the optimization and use the quantiles of order p = 0.25, 0.5 and 0.75 as the clipping level for the constant clipping baselines.