reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

Authors: Can Karakus, Yifan Sun, Suhas Diggavi, Wotao Yin

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement the proposed technique on Amazon EC2 clusters, and demonstrate its performance over several learning problems, including matrix factorization, LASSO, ridge regression and logistic regression, and compare the proposed method with uncoded, asynchronous, and data replication strategies. ... Numerical Results
Researcher Affiliation	Collaboration	Can Karakus EMAIL Amazon Web Services East Palo Alto, CA 94303, USA Yifan Sun EMAIL Department of Computer Science University of British Columbia Vancouver, BC, Canada Suhas Diggavi EMAIL Department of Electrical and Computer Engineering University of California, Los Angeles Los Angeles, CA 90095, USA Wotao Yin EMAIL Department of Mathematics University of California, Los Angeles Los Angeles, CA 90095, USA
Pseudocode	Yes	Algorithm 1 Generic encoded distributed optimization procedure under data parallelism, at the master node. Algorithm 2 Generic encoded distributed optimization procedure under data parallelism, at worker node i. Algorithm 3 Encoded block coordinate descent at worker node i. Algorithm 4 Encoded block coordinate descent at the master node.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the methodology described. It only states: "We implement the proposed technique on Amazon EC2 clusters".
Open Datasets	Yes	We next apply matrix factorization on the Movie Lens-1M dataset (Riedl and Konstan (1998)) for the movie recommendation task. ... In our next experiment, we apply logistic regression for document classiﬁcation for Reuters Corpus Volume 1 (rcv1.binary) dataset from Lewis et al. (2004)
Dataset Splits	Yes	We withhold randomly 20% of these ratings to form an 80/20 train/test split. ... reserve 100,000 documents for the test set.
Hardware Specification	Yes	We implement distributed L-BFGS as described in Section 3 on an Amazon EC2 cluster using mpi4py Python package, over m = 32 m1.small instances as worker nodes, and a single c3.8xlarge instance as the central server. ... The Movielens experiment is run on a single 32-core machine with Linux 4.4. ... We implement the algorithm over 128 t2.medium worker nodes which collectively store the matrix X, and a c3.4xlarge master node. ... We use m = 128 t2.medium instances as worker nodes, and a single c3.4xlarge instance as the master node ... we use compute-optimized, high-performance, high-bandwidth c4.4xlarge and c4.large instances available through Amazon EC2
Software Dependencies	No	The paper mentions software like "mpi4py Python package" and "numpy.linalg.solve", but it does not specify any version numbers for these software components or any other libraries/frameworks used.
Experiment Setup	Yes	We choose b = 3, p = 15, and λ = 10, which achieves test RMSE 0.861... We choose λ = 0.6 and consider the sparsity recovery performance... We use logistic regression with ℓ2-regularization for the classiﬁcation task, with the objective min w,b 1/n i=1 log(1 + exp(ziw + b)) + λ\|\|w\|\|^2... We compare the methods against each other by measuring the wall-clock time of optimization required to achieve a ﬁxed mean-squared error. ... we determine to be λ = 0.025. Then, we set the MSE bar as 1.05 MMSE. ... In all cases, we use gradient descent with step size αt = 0.2, which is run for 120 steps.