reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Biased Compression for Distributed Learning

Authors: Aleksandr Beznosikov, Samuel Horváth, Peter Richtárik, Mher Safaryan

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Sections 6.1–6.4, we present our experiments, which are primarily focused on supporting our theoretical findings. Therefore, we simulate these experiments on one machine which enable us to do rapid direct comparisons against the prior methods. In more details, we use the machine with 24 Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz cores and GPU Ge Force GTX 1080 Ti. Section 6.5 is devoted to real experiments with a large model and big data. For these experiments, we use a computational cluster with 10 GPUs Tesla T4. We implement all methods in Python 3.7 using Pytorch Paszke et al. (2019).
Researcher Affiliation	Academia	Aleksandr Beznosikov EMAIL Computer, Electrical and Math. Sciences and Engineering Division King Abdullah University of Science and Technology, 23955, Thuwal, KSA Skolkovo Institute of Science and Technology, 121205, Moscow, Russia School of Applied Mathematics and Informatics Moscow Institute of Physics and Technology, 141701, Moscow, Russia
Pseudocode	Yes	Algorithm 1 Distributed SGD with Biased Compression and Error Feedback
Open Source Code	No	The paper states: "We implement all methods in Python 3.7 using Pytorch Paszke et al. (2019)." However, it does not provide an explicit statement about releasing their own source code for the methodology described in the paper, nor does it include a link to a repository.
Open Datasets	Yes	Practical distribution. We obtained various gradient distributions via logistic regression (mushrooms LIBSVM dataset) and least squares. We run 2 sets of experiments with Resnet18 on CIFAR10 dataset. Figure 4 displays training/test loss and accuracy for VGG19 on CIFAR10 with data equally distributed among 4 nodes. We train ALBERT-large (Lan et al., 2020) (18M parameters) with layer sharing on a combination of Bookcorpus (Zhu et al., 2015) and Wikipedia (Devlin et al., 2018) datasets. For the second experiment shown in Figure 8, we run standard linear regression on two scikit-learn datasets Boston and Diabetes and applied data normalization as the preprocessing step.
Dataset Splits	Yes	Figure 4 displays training/test loss and accuracy for VGG19 on CIFAR10 with data equally distributed among 4 nodes. We train ALBERT-large (Lan et al., 2020) (18M parameters) with layer sharing on a combination of Bookcorpus (Zhu et al., 2015) and Wikipedia (Devlin et al., 2018) datasets. We measure how the training loss changes (Figure 9) as well as at the end of training we evaluate the final performance for each model on several popular tasks from (Wang et al., 2018) (Table 5).
Hardware Specification	Yes	We use the machine with 24 Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz cores and GPU Ge Force GTX 1080 Ti. Section 6.5 is devoted to real experiments with a large model and big data. For these experiments, we use a computational cluster with 10 GPUs Tesla T4.
Software Dependencies	No	The paper states: "We implement all methods in Python 3.7 using Pytorch Paszke et al. (2019)." While Python 3.7 is mentioned with a version, a specific version for PyTorch is not provided. "Paszke et al. (2019)" refers to the paper introducing PyTorch, not a version number used in this work.
Experiment Setup	Yes	We use plain SGD with a default step size equal to 0.01 for all methods, i.e. Top-5 with and without error feedback, Rand-5 and no compression. We use 2 levels with infinity norm for natural dithering and k = 5 for sparsification methods. For all the compression operators, we train VGG11 on CIFAR10 with plain SGD as an optimizer and default step size equal to 0.01. We use the same optimizer (LAMB) and the same tuning for it as in the original paper (Lan et al., 2020).