Improved Asynchronous Parallel Optimization Analysis for Stochastic Incremental Methods

Authors: Remi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

JMLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results of an implementation on a 40-core architecture illustrating the practical speedups as well as the hardware overhead. (...) In Section 6, we provide a practical implementation of Asaga and illustrate its performance on a 40-core architecture, showing improvements compared to asynchronous variants of Svrg and Sgd. We also present experiments on the overlap bound τ, showing that it encompasses much more complexity than suggested in previous work.
Researcher Affiliation Academia Rémi Leblond EMAIL INRIA Sierra Project-Team École Normale Supérieure, Paris Fabian Pedregosa EMAIL INRIA Sierra Project-Team École Normale Supérieure, Paris Simon Lacoste-Julien EMAIL Department of CS & OR (DIRO) Université de Montréal, Montréal
Pseudocode Yes Algorithm 1 Asaga (analyzed algorithm) Algorithm 2 Asaga (implementation) Algorithm 3 Kromagnon (Mania et al., 2017) Algorithm 4 Ahsvrg Algorithm 5 Hogwild
Open Source Code Yes The code we used to run all the experiments is available at http://www.di.ens.fr/ sierra/research/asaga/.
Open Datasets Yes We consider two sparse data sets: RCV1 (Lewis et al., 2004) and URL (Ma et al., 2009); and a dense one, Covtype (Collobert et al., 2002), with statistics listed in the table below. (...) This data set was obtained from the libsvmtools project.39 http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
Dataset Splits No The paper does not explicitly mention training/validation/test splits, split percentages, or sample counts for the datasets used. It refers to the datasets generally and mentions the task as binary classification, but no specific split information is provided.
Hardware Specification Yes All experiments were run on a Dell Power Edge 920 machine with 4 Intel Xeon E7-4830v2 processors with 10 2.2GHz cores each and 384GB 1600 MHz RAM.
Software Dependencies Yes All algorithms were implemented in the Scala language and the software stack consisted of a Linux operating system running Scala 2.11.7 and Java 1.6. (...) For compare-and-swap instructions we used the Atomic Double Array class from the Google library Guava.
Experiment Setup Yes Following Schmidt et al. (2016), the amount of regularization used was set to µ = 1/n. In each update, we project the gradient of the regularization term (we multiply it by Di as we also do with the vector α) to preserve the sparsity pattern while maintaining an unbiased estimate of the gradient. (...) For each algorithm, we picked the best step size among 10 equally spaced values in a grid, and made sure that the best step size was never at the boundary of this interval. For Covtype and RCV1, we used the interval [ 1/L ], whereas for URL we used the interval [ 1/L ] as it admitted larger step sizes. It turns out that the best step size was fairly constant for different number of cores for both Asaga and Kromagnon, and both algorithms had similar best step sizes (0.7 for RCV1, 0.05 for URL and 5 10 5 for Covtype).