Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks
Authors: Alireza Fallah, Mert Gürbüzbalaban, Asuman Ozdaglar, Umut Şimşekli, Lingjiong Zhu
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct several experiments to validate our theory and assess the performance of D-SG and D-ASG. We consider a (regularized) logistic regression problem which is a common formulation to solve binary classification tasks: i=1 log 1 + exp yi X i x + λ x 2 2 where (Xi, yi) denotes a data pair: Xi Rd is the feature vector and yi { 1, 1} denotes the label, and n denotes the number of data pairs. In all our experiments, we assume that each computation node has access to a subset of all data points, and the noisy gradient in (6) and (13) basically becomes the stochastic gradient that is computed on a random sub-sample of the data. More precisely, we will assume that at each iteration, each computation node will draw a random sub-sample from the data points that it has access to, and compute the stochastic gradient by using this subsample. |
| Researcher Affiliation | Academia | Alireza Fallah* EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139, United States of America Mert G urb uzbalaban* EMAIL Department of Management Science and Information Systems Departments of Electrical Engineering and Statistics Rutgers University Piscataway, NJ 08854, United States of America Asuman Ozdaglar* EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02139, United States of America Umut S im sekli* EMAIL INRIA D epartement d Informatique de l Ecole Normale Sup erieure PSL Research University Paris, France Lingjiong Zhu* EMAIL Department of Mathematics Florida State University Tallahassee, FL 32306, United States of America |
| Pseudocode | Yes | Algorithm 1: Distributed Multistage Accelerated Stochastic Gradient Algorithm (D-MASG) Input : Initial iterate x(0), The sequence {αi}T i=1 of stepsizes, The sequence {ki}T i=1 of length of stages. Set x(0,k0) = x(0); for t = 1; t T; t = t + 1 do Set x(t, 1) = x(t,0) = x(t 1,kt 1); for m = 0; m kt 1; m = m + 1 do Set βt = 1 µαt 1+ µαt; Set y(t,m) = (1 + βt)x(t,m) βtx(t,m 1); Set x(t,m+1) = Wy(t,m) αt f y(t,m) |
| Open Source Code | Yes | In this set of experiments, we simulate the distributed environment in MATLAB and we provide our implementation in the supplementary material. |
| Open Datasets | Yes | The MNIST data set contains 70K binary images (of size d = 20 20) corresponding to 10 different digits.7 To obtain a binary classification problem, we extract the images corresponding to the digits 0 and 8, where we end up with n = 11774 images in total. On the other hand, the Epsilon data set is one of the standard binary classification data sets,8 and contains n = 400K samples with d = 2000. 7. http://yann.lecun.com/exdb/mnist. 8. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html. |
| Dataset Splits | No | The paper mentions a "batch proportion b = 0.1" used for computing stochastic gradients, but this refers to sub-sampling for gradient estimation, not explicit training/validation/test dataset splits. There is no mention of specific percentages, counts, or methodologies for partitioning the datasets into training, validation, or test sets. |
| Hardware Specification | No | We have implemented all the algorithms in C++ by using a low-level message passing protocol for parallel processing, namely the Open MPI library.9 In order to have a realistic experimental environment, we have conducted these experiments on a cluster interconnected computers, each of which is equipped with different quality CPUs and memories. |
| Software Dependencies | No | In this set of experiments, we simulate the distributed environment in MATLAB and we provide our implementation in the supplementary material. We have implemented all the algorithms in C++ by using a low-level message passing protocol for parallel processing, namely the Open MPI library.9 Footnote 9: https://www.open-mpi.org. The paper mentions MATLAB, C++, and the Open MPI library, but it does not specify version numbers for any of these software components. |
| Experiment Setup | Yes | Unless stated otherwise, we first generate n = 1000 data points, set the dimension d = 100, data variance σ2 X = 5, λ = 0.05, the number of nodes N = 10, the batch proportion b = 0.1, α = 0.1, and we consider the circular network architecture. In this problem, we first compute an estimate for L and µ from the data, where we obtain L ≈ 50 and µ ≈ 0.1. Then, we vary L from 5 to 500 by fixing µ = 0.1, and we vary µ from 0.01 to 1 by fixing L = 50. Accordingly, we run the algorithms with hyperparameters that are computed with these values for L and µ. For D-SG we set the stepsize α = (1 + λW N )/L, whereas for D-ASG we set α = λW N /L and β = 1 αµ 1+ αµ. Finally, for D-MASG we set κ = L+µ µλW N as in (38) and use the setting reported in Proposition 14. |