DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs
Authors: Chaouki Ben Issaid, Anis Elgabli, Mehdi Bennis
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation results show that our proposed algorithm can improve the worst distribution test accuracy by up to 10%. Moreover, DR-DSGD is more communication-efficient than DSGD since it requires fewer communication rounds (up to 20 times less) to achieve the same worst distribution test accuracy target. Furthermore, the conducted experiments reveal that DR-DSGD results in a fairer performance across devices in terms of test accuracy. The section "6 Experiments" details the empirical evaluation of the proposed method. |
| Researcher Affiliation | Academia | Chaouki Ben Issaid EMAIL Centre for Wireless Communications University of Oulu, Finland Anis Elgabli EMAIL Centre for Wireless Communications University of Oulu, Finland Mehdi Bennis EMAIL Centre for Wireless Communications University of Oulu, Finland |
| Pseudocode | Yes | Algorithm 1 Vanilla Decentralized SGD (DSGD) and Algorithm 2 Distributionally Robust Decentralized SGD (DR-DSGD) are presented in the paper, providing structured pseudocode. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described. It only mentions implementing algorithms using Py Torch. |
| Open Datasets | Yes | For our experiments, we consider the image classification task using two main datasets: Fashion MNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | For each dataset, we distribute the data across the K devices in a pathological non-IID way, as in (Mc Mahan et al., 2017), to mimic an actual decentralized learning setup. More specifically, we first order the samples according to the labels and divide data into shards of equal sizes. Finally, we assign each device the same number of chunks. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments. It only mentions implementing algorithms using Py Torch. |
| Software Dependencies | No | The paper mentions "Py Torch" for implementing algorithms and "networkx package (Hagberg et al., 2008)" for graph generation, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For Fashion MNIST, we use an MLP model with Re LU activations having two hidden layers with 128 and 64 neurons, respectively. For the CIFAR10 dataset, we use a CNN model composed of three convolutional layers followed by two fully connected layers, each having 500 neurons. [...] Unless explicitly stated, we choose the learning rate η = p K/T and the mini-batch size B = T. [...] In this section, we consider K = 10 devices and µ = 6. For Fashion MNIST, we consider a value of p = 0.3 while we take p = 0.5 for CIFAR10. [...] From this section on, we consider K = 25. To investigate the fairness of the performance across the devices, we run the experiments on Fashion MNIST and CIFAR10 datasets reporting the final test accuracy on each device in the case when µ = 9. |