An Asynchronous Parallel Stochastic Coordinate Descent Algorithm

Authors: Ji Liu, Stephen J. Wright, Christopher Ré, Victor Bittorf, Srikrishna Sridhar

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the behavior of two variants of the stochastic coordinate descent approach on test problems constructed from several data sets. Our interests are in the efficiency of multicore implementations (by comparison with a single-threaded implementation) and in performance relative to alternative solvers for the same problems. All our test problems have the form (1), with either Ω= Rn or Ωseparable as in (2). The objective f is quadratic, that is, 2x T Qx + c T x, with Q symmetric positive definite. Our implementation of Asy SCD is called DIMM-WITTED (or DW for short). It runs on various numbers of threads, from 1 to 40, each thread assigned to a single core in our 40core Intel Xeon architecture.
Researcher Affiliation Academia Ji Liu EMAIL Stephen J. Wright EMAIL Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706-1685 Christopher R e EMAIL Department of Computer Science Stanford University 353 Serra Mall Stanford, CA 94305-9025 Victor Bittorf EMAIL Srikrishna Sridhar EMAIL Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706-1685
Pseudocode Yes Algorithm 1 Asynchronous Stochastic Coordinate Descent Algorithm x K+1 = Asy SCD(x0, γ, K) Require: x0 Ω, γ, and K Ensure: x K+1 1: Initialize j 0; 2: while j K do 3: Choose i(j) from {1, . . . , n} with equal probability; 4: xj+1 PΩ xj γ Lmax ei(j) i(j)f(xk(j)) ; 5: j j + 1; 6: end while
Open Source Code No The paper mentions its implementation called DIMM-WITTED (DW) and describes its architecture tailoring but does not explicitly state that the code is publicly available or provide a link to a repository.
Open Datasets Yes All data sets used in 4 except reuters were obtained from the LIBSVM data set repository.1 The data set reuters is a sparse binary text classification data set constructed as a one-versus-all version of Reuters-2159.2 1. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ 2. http://www.daviddlewis.com/resources/testcollections/reuters21578/
Dataset Splits No The paper uses synthetic data for some problems and mentions datasets from LIBSVM and Reuters. It discusses 'Train time(sec)' in Table 4, but it does not specify exact training, validation, or test splits (e.g., percentages or counts) for any of the datasets used in its experiments.
Hardware Specification Yes Our implementation, described in Section 6, is a little more complex than this simple model would suggest, as it is tailored to the architecture of the Intel Xeon machine that we use for experiments. Our implementation of Asy SCD is called DIMM-WITTED (or DW for short). It runs on various numbers of threads, from 1 to 40, each thread assigned to a single core in our 40core Intel Xeon architecture.
Software Dependencies No The paper mentions comparing Asy SCD to LIBSVM (Chang and Lin, 2011) and using it for experiments, but it does not provide specific version numbers for LIBSVM or any other software dependencies.
Experiment Setup Yes In Algorithm 1, we set the steplength parameter γ to 1, and we choose initial iterate to be x0 = 0. ... The methodology for generating A and x and for choosing the values of m, n, γ, and x0 is the same as for (27). ... min x [0,1]n c T x + β 2 Ax b 2 + 1 2β x 2, (29) with β = 5. ... We investigate both shuffling after every epoch (p = 1) and after every tenth epoch (p = 10). ... Wall clock times required for the four test problems on 1 and 40 cores, to reduce residuals below 10 5 are shown in Table 1. ... achieve a residual below 0.06. ... based on the residual 10 3.