reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement

Authors: Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct numerical experiments with simulated and real data to compare our DSVRG algorithm with DISCO (Zhang and Lin, 2015) and a distributed implementation of gradient descent (GD) method. [...] The performance of the three methods are shown in Figure 2, where the vertical axis represents the logarithm of optimality gap (i.e., log(f(xk) f(x ))) and the horizon axis represents the number of rounds of communication.
Researcher Affiliation	Academia	Jason D. Lee EMAIL Marshall School of Business University of Southern California Los Angeles, CA 90089, USA Qihang Lin EMAIL Tippie College of Business University of Iowa Iowa City, IA 52242, USA Tengyu Ma EMAIL Department of Computer Science Princeton University Princeton, NJ 08544, USA Tianbao Yang EMAIL Department of Computer Science University of Iowa Iowa City, IA 52242, USA
Pseudocode	Yes	Algorithm 1 Distributed SVRG (DSVRG) [...] Algorithm 2 Single-Stage SVRG: SS-SVRG( x, {fi}i [N], h, {Rj}j [m], k, η, T) [...] Algorithm 3 Distributed Accelerated SVRG (DASVRG) [...] Algorithm 4 Data Allocation : DA(N, m, Q, α)
Open Source Code	No	The paper does not provide a direct link to a source-code repository, an explicit code release statement for the methodology described, or mention of code in supplementary materials.
Open Datasets	Yes	In this section, we conduct numerical experiments with real data to compare our DSVRG and DASVRG algorithms with Dis DSCA by Yang (2013) and a distributed implementation of the accelerated gradient method (Accel Grad) by Nesterov (2013). We apply these four algorithms to the ERM problem (2) with three data sets: Covtype, Million Song and Epsilon. [...] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
Dataset Splits	No	The paper mentions characteristics of the datasets (e.g., 'After generating RFF, Covtype data has N = 522, 911 examples d = 1, 000 features and Million Song data has N = 463, 715 examples and d = 2, 000 features.'), but does not explicitly provide training/test/validation splits, proportions, or specific methods for creating them beyond stating general dataset properties.
Hardware Specification	Yes	We use a single machine to simulate the distributed environment and all methods are implemented in Matlab running on a 64-bit Microsoft Windows 10 machine with a 2.70Ghz Intel(R) i7-6820HQ CPU and 8GB of memory. [...] The experiments are conducted on a server (Intel(R) Xeon(R) CPU E5-2667 v2 3.30GHz) with multiple processes with each process simulating one machine.
Software Dependencies	No	The paper mentions 'all methods are implemented in Matlab' but does not specify a version number for Matlab or any other specific software libraries or dependencies with version numbers.
Experiment Setup	Yes	We choose λ = 10 4, d = 50, N = 104, ω {0, 0.3, 0.5} and m {10, 20}. [...] In this experiment, we choose α = 1 [...] In the GD method, we use a step length of η = 1/L. [...] We choose the regularization parameter λ to be 1/N0.5, 1/N0.75 and 1/N. For each setting, L is computed as maxi=1,...,N ai 2 γ + λ [...] We implement DSVRG by choosing η = 1/L, T = 10,000 and K = N/T. For DASVRG, we choose η = 1/L, T = 10,000, K = 1 and P = N/T.