Buffered Asynchronous SGD for Byzantine Learning

Authors: Yi-Rui Yang, Wu-Jun Li

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our methods significantly outperform existing ABL baselines when there exists failure or attack on workers. In this section, we empirically evaluate the performance of BASGD (BASGDm) and baselines in both image classification (IC) and natural language processing (NLP) applications.
Researcher Affiliation Academia Yi-Rui Yang EMAIL Wu-Jun Li EMAIL National Key Laboratory for Novel Software Technology Department of Computer Science and Technology Nanjing University, Nanjing 210023, China
Pseudocode Yes Algorithm 1 Buffered Asynchronous SGD (BASGD)
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It only mentions that algorithms are implemented with Py Torch 1.3 without a specific release statement or link.
Open Datasets Yes In the experiment, algorithms are evaluated on CIFAR-10 (Krizhevsky et al., 2009) with deep learning model Res Net-20 (He et al., 2016). (...) In our NLP experiment, the methods are evaluated on the Wiki Text-2 dataset with an LSTM (Hochreiter and Schmidhuber, 1997) network.
Dataset Splits No The paper mentions using CIFAR-10 and Wiki Text-2 datasets and states 'The training set is randomly and equally distributed to different workers' and 'We only use the training set and test set, while the validation set is not used in our experiment.' However, it does not provide specific percentages, sample counts, or explicit references to predefined train/test/validation splits used for the experiments.
Hardware Specification Yes Our experiments are conducted on a distributed platform with dockers. Each docker is bound to an NVIDIA Tesla V100 (32G) GPU.
Software Dependencies Yes All algorithms are implemented with Py Torch 1.3.
Experiment Setup Yes We set momentum hyper-parameter µ = 0.9 for BASGDm and ASGDm in each experiment. (...) learning rate η is set to 0.1 initially for each algorithm, and multiplied by 0.1 at the 80-th epoch and the 120-th epoch respectively. The weight decay is set to 10-4. We run each algorithm for 160 epochs. The batch size is set to 25.