reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning

Authors: Anis Elgabli, Jihong Park, Amrit S. Bedi, Mehdi Bennis, Vaneet Aggarwal

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our theoretical foundations, we numerically evaluate the performance of GADMM in linear and logistic regression tasks, compared with the following benchmark algorithms.
Researcher Affiliation	Academia	Anis Elgabli EMAIL Jihong Park EMAIL Amrit S. Bedi EMAIL Mehdi Bennis EMAIL Vaneet Aggarwal EMAIL
Pseudocode	Yes	The detailed steps of the proposed algorithm are summarized in Algorithm 1. ... The detailed steps of D-GADMM is described in Algorithm 2.
Open Source Code	No	No explicit statement or link for the paper's source code is provided.
Open Datasets	Yes	All simulations are conducted using the synthetic and real datasets described in (Dua and Graﬀ, 2017; Chen et al., 2018). ... Next, the real data tests linear and logistic regression tasks with Body Fat (252 samples, 14 features) and Derm (358 samples, 34 features) datasets (Dua and Graﬀ, 2017), respectively. ... Dheeru Dua and Casey Graﬀ. UCI machine learning repository, 2017. URL http://archive. ics.uci.edu/ml.
Dataset Splits	No	The paper mentions data being "evenly split into workers" (e.g., "We consider 1,200 samples with 50 features, which are evenly split into workers.") and the number of workers for real/synthetic datasets, but does not provide details on specific training/test/validation splits (percentages or counts) or reference to standard splits for model evaluation.
Hardware Specification	No	The paper discusses communication cost metrics, bandwidth (B = 2MHz), noise spectral density (N0 = 1E-6), and power consumption for communication, but does not specify any hardware used for computation (e.g., CPU, GPU models, or memory) for running the machine learning experiments.
Software Dependencies	No	For the tuning parameters, we use the setup in (Chen et al., 2018). No specific software dependencies with version numbers are provided for the implementation of GADMM or D-GADMM.
Experiment Setup	Yes	For linear regression with the synthetic dataset, Fig. 2 shows that all variants of GADMM with ρ = 3, 5, and 7 achieve the target objective error of 10-4 in less than 1,000 iterations... For logistic regression, Figs. 4 and 5 validate that GADMM outperforms the benchmark algorithms... D-GADMM, ρ=1, coherence time =15 iter... D-GADMM, refresh rate =1, D-GADMM, refresh rate =10, D-GADMM, refresh rate =50.