Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Authors: Zeyuan Allen-Zhu

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude this paper with empirical evaluations to our theoretical speed-ups. We work on Lasso and ridge regressions (with regularizer λ 2 x 2 for ridge and regularizer λ x 1 for Lasso) on the following six datasets: adult, web, mnist, rcv1, covtype, sensit. We defer dataset and implementation details to Appendix B.
Researcher Affiliation Industry Zeyuan Allen-Zhu EMAIL Microsoft Research AI Redmond, WA 98052, USA
Pseudocode Yes Algorithm 1 Katyusha(x0, S, σ, L) ... Algorithm 2 Katyushans(x0, S, σ, L) ... Algorithm 3 Katyusha1(x0, S, σ, L, (L1, . . . , Ln), b) ... Algorithm 4 Katyusha2(x0, S, σ, (L1, . . . , Ln)) ... Algorithm 5 Katyusha2ns(x0, S, σ, (L1, . . . , Ln))
Open Source Code No The paper does not explicitly state that source code for the described methodology is publicly available or provide a link to a repository.
Open Datasets Yes The datasets we used in this paper are downloaded from the Lib SVM website (Fan and Lin): the adult (a9a) dataset (32, 561 samples and 123 features). the web (w8a) dataset (49, 749 samples and 300 features). the covtype (binary.scale) dataset (581, 012 samples and 54 features). the mnist (class 1) dataset (60, 000 samples and 780 features). the rcv1 (train.binary) dataset (20, 242 samples and 47, 236 features). the sensit (combined) dataset (78, 823 samples and 100 features).
Dataset Splits No The paper mentions using various datasets for empirical evaluations but does not explicitly specify training, test, or validation splits. It discusses parameter tuning and performance plots on 'training objective distance to the minimum' but without details on data partitioning.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments.
Software Dependencies No The paper mentions implementing various algorithms and discusses parameter tuning. However, it does not specify any software dependencies like programming languages, libraries, or solvers with version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup Yes Algorithms and Parameter Tuning. We have implemented the following algorithms, all with mini-batch size 1 for this version of the paper: SVRG (Johnson and Zhang, 2013) with default epoch length m = 2n. We tune only one parameter: the learning rate. Katyusha for ridge and Katyushans for Lasso. We tune only one parameter: the learning rate. SAGA (Defazio et al., 2014). We tune only one parameter: the learning rate. Catalyst (Lin et al., 2015) on top of SVRG. We tune three parameters: SVRG s learning rate, Catalyst s learning rate, as well as the regularizer weight in the Catalyst reduction. APCG (Lin et al., 2014). We tune the learning rate. For Lasso, we also tune the ℓ2 regularizer weight. APCG+Adapt Reg (Lasso only). Two parameters to be tuned: APCG s learning rate and σ0 in Adapt Reg. We emphasize that Katyusha is as simple as SAGA or SVRG in terms of parameter tuning. In contrast, APCG for Lasso requires two parameters to be tuned, and Catalyst requires three. We select learning rates from the set {10 k, 2 10 k, 5 10 k : k Z}, and select regularizer weights (for APCG) from the set {10 k : k Z}. For Catalyst, in principle one also has to tune the stopping criterion. After communicating with an author of Catalyst, we learned that one can terminate the inner loop whenever the duality gap becomes no more than, say one fourth, of the last duality gap from the previous epoch (Lin, 2016).