Optimal Complexity in Byzantine-Robust Distributed Stochastic Optimization with Data Heterogeneity
Authors: Qiankun Shi, Jie Peng, Kun Yuan, Xiao Wang, Qing Ling
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive numerical experiments to evaluate the performance of Algorithm 1. Here we do not consider Algorithms 2 and 3, which exhibit strong theoretical guarantees at the cost of complicated hyperparameter tuning. Experimental setup. We consider two tasks, logistic regression and convolutional neural network training. For the first task, we consider a distributed network of 10 nodes within which 2 are Byzantine. For the second task, we consider a distributed network of 30 nodes within which 5 are Byzantine. The training dataset is MNIST with 10 classes, each having 6,000 training samples. |
| Researcher Affiliation | Academia | Qiankun Shi EMAIL School of Computer Science and Engineering Sun Yat-Sen University Guangzhou, China Pengcheng Laboratory Shenzhen, China Jie Peng EMAIL School of Computer Science and Engineering Sun Yat-Sen University Guangzhou, China Kun Yuan EMAIL Center for Machine Learning Research Peking University Beijing, China Xiao Wang EMAIL School of Computer Science and Engineering Sun Yat-Sen University Guangzhou, China Qing Ling EMAIL School of Computer Science and Engineering Sun Yat-Sen University Guangzhou, China |
| Pseudocode | Yes | Algorithm 1 Byzantine-robust distributed stochastic Nesterov s accelerated method with variance reduction (Byrd-Nester) Algorithm 2 Byrd-Nester with restart (Byrd-re Nester) Algorithm 3 Inexact Proximal Point Algorithm with Byrd-re Nester |
| Open Source Code | Yes | More results can be found via running our source code at https://github.com/sqkkk/Byrd-Nester. |
| Open Datasets | Yes | The training dataset is MNIST with 10 classes, each having 6,000 training samples. |
| Dataset Splits | No | The paper describes how the training dataset is distributed and shuffled among honest nodes for heterogeneous data distribution, but it does not explicitly provide information on standard training/validation/test splits with percentages or sample counts for reproduction. |
| Hardware Specification | No | The paper discusses running 'extensive numerical experiments' and mentions 'a distributed network of 10 nodes' and '30 nodes' but does not specify any particular hardware components such as CPU models, GPU models, or memory configurations. |
| Software Dependencies | No | The paper mentions implementing 'Byzantine-robust distributed mini-batch SGD (DSGD) and its momentum variant (DSGDm, Karimireddy et al. (2022)) as the baselines', along with 'fourteen robust aggregation rules' and 'nine Byzantine attacks'. However, it does not provide specific version numbers for any software, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | The step size is set to 0.1, the batch size is 32, and the total number of epoches is 45. |