Buffered Asynchronous SGD for Byzantine Learning
Authors: Yi-Rui Yang, Wu-Jun Li
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our methods significantly outperform existing ABL baselines when there exists failure or attack on workers. In this section, we empirically evaluate the performance of BASGD (BASGDm) and baselines in both image classification (IC) and natural language processing (NLP) applications. |
| Researcher Affiliation | Academia | Yi-Rui Yang EMAIL Wu-Jun Li EMAIL National Key Laboratory for Novel Software Technology Department of Computer Science and Technology Nanjing University, Nanjing 210023, China |
| Pseudocode | Yes | Algorithm 1 Buffered Asynchronous SGD (BASGD) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It only mentions that algorithms are implemented with Py Torch 1.3 without a specific release statement or link. |
| Open Datasets | Yes | In the experiment, algorithms are evaluated on CIFAR-10 (Krizhevsky et al., 2009) with deep learning model Res Net-20 (He et al., 2016). (...) In our NLP experiment, the methods are evaluated on the Wiki Text-2 dataset with an LSTM (Hochreiter and Schmidhuber, 1997) network. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and Wiki Text-2 datasets and states 'The training set is randomly and equally distributed to different workers' and 'We only use the training set and test set, while the validation set is not used in our experiment.' However, it does not provide specific percentages, sample counts, or explicit references to predefined train/test/validation splits used for the experiments. |
| Hardware Specification | Yes | Our experiments are conducted on a distributed platform with dockers. Each docker is bound to an NVIDIA Tesla V100 (32G) GPU. |
| Software Dependencies | Yes | All algorithms are implemented with Py Torch 1.3. |
| Experiment Setup | Yes | We set momentum hyper-parameter µ = 0.9 for BASGDm and ASGDm in each experiment. (...) learning rate η is set to 0.1 initially for each algorithm, and multiplied by 0.1 at the 80-th epoch and the 120-th epoch respectively. The weight decay is set to 10-4. We run each algorithm for 160 epochs. The batch size is set to 25. |