Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement
Authors: Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct numerical experiments with simulated and real data to compare our DSVRG algorithm with DISCO (Zhang and Lin, 2015) and a distributed implementation of gradient descent (GD) method. [...] The performance of the three methods are shown in Figure 2, where the vertical axis represents the logarithm of optimality gap (i.e., log(f(xk) f(x ))) and the horizon axis represents the number of rounds of communication. |
| Researcher Affiliation | Academia | Jason D. Lee EMAIL Marshall School of Business University of Southern California Los Angeles, CA 90089, USA Qihang Lin EMAIL Tippie College of Business University of Iowa Iowa City, IA 52242, USA Tengyu Ma EMAIL Department of Computer Science Princeton University Princeton, NJ 08544, USA Tianbao Yang EMAIL Department of Computer Science University of Iowa Iowa City, IA 52242, USA |
| Pseudocode | Yes | Algorithm 1 Distributed SVRG (DSVRG) [...] Algorithm 2 Single-Stage SVRG: SS-SVRG( x, {fi}i [N], h, {Rj}j [m], k, η, T) [...] Algorithm 3 Distributed Accelerated SVRG (DASVRG) [...] Algorithm 4 Data Allocation : DA(N, m, Q, α) |
| Open Source Code | No | The paper does not provide a direct link to a source-code repository, an explicit code release statement for the methodology described, or mention of code in supplementary materials. |
| Open Datasets | Yes | In this section, we conduct numerical experiments with real data to compare our DSVRG and DASVRG algorithms with Dis DSCA by Yang (2013) and a distributed implementation of the accelerated gradient method (Accel Grad) by Nesterov (2013). We apply these four algorithms to the ERM problem (2) with three data sets: Covtype, Million Song and Epsilon. [...] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html |
| Dataset Splits | No | The paper mentions characteristics of the datasets (e.g., 'After generating RFF, Covtype data has N = 522, 911 examples d = 1, 000 features and Million Song data has N = 463, 715 examples and d = 2, 000 features.'), but does not explicitly provide training/test/validation splits, proportions, or specific methods for creating them beyond stating general dataset properties. |
| Hardware Specification | Yes | We use a single machine to simulate the distributed environment and all methods are implemented in Matlab running on a 64-bit Microsoft Windows 10 machine with a 2.70Ghz Intel(R) i7-6820HQ CPU and 8GB of memory. [...] The experiments are conducted on a server (Intel(R) Xeon(R) CPU E5-2667 v2 3.30GHz) with multiple processes with each process simulating one machine. |
| Software Dependencies | No | The paper mentions 'all methods are implemented in Matlab' but does not specify a version number for Matlab or any other specific software libraries or dependencies with version numbers. |
| Experiment Setup | Yes | We choose λ = 10 4, d = 50, N = 104, ω {0, 0.3, 0.5} and m {10, 20}. [...] In this experiment, we choose α = 1 [...] In the GD method, we use a step length of η = 1/L. [...] We choose the regularization parameter λ to be 1/N0.5, 1/N0.75 and 1/N. For each setting, L is computed as maxi=1,...,N ai 2 γ + λ [...] We implement DSVRG by choosing η = 1/L, T = 10,000 and K = N/T. For DASVRG, we choose η = 1/L, T = 10,000, K = 1 and P = N/T. |