DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity
Authors: Chung-Yiu Yau, Hoi To Wai
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate that our algorithm outperforms several state-of-the-art algorithms in practice. We empirically evaluate the performance of Do Co M on training linear models and deep learning models using synthetic and real data, on non-convex losses. |
| Researcher Affiliation | Academia | Chung-Yiu Yau EMAIL The Chinese University of Hong Kong. Hoi-To Wai EMAIL The Chinese University of Hong Kong. |
| Pseudocode | Yes | Algorithm 1 Do Co M Algorithm |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the Do Co M algorithm, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Synthetic Data with Linear Model Consider a set of synthetic data generated with the leaf benchmarking framework (Caldas et al., 2019)... MNIST Data with Feed-forward Network ...on the MNIST dataset... FEMNIST Data with Le Net-5 ...on the FEMNIST dataset. |
| Dataset Splits | No | The paper describes how data is partitioned among agents (e.g., 'm = 1443 samples partitioned into n = 25 non-i.i.d. portions', 'samples are partitioned into n = 10 agents where each agent only gets 1 class of samples'), but it does not specify conventional training, validation, or test dataset splits in terms of percentages or absolute counts. |
| Hardware Specification | Yes | We run the decentralized optimization algorithms on a 40 threads Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz server with MPI-enabled Py Torch and evaluate the performance of trained models on a Tesla K80 GPU server. |
| Software Dependencies | No | The paper mentions 'MPI-enabled Py Torch' but does not provide specific version numbers for PyTorch, MPI, or any other software dependencies. |
| Experiment Setup | Yes | For all algorithms we choose the learning rate η from {0.1, 0.01, 0.001}, and fix the regularization parameter as λ = 10 4 ... For Do Co M and GT-HSGD, we choose the best momentum parameter β in {0.0001, 0.001, 0.01, 0.1, 0.5, 0.9} and fix the initial batch number as b0,i = mi. We choose the batch sizes such that all algorithms spend the same amount of computation on stochastic gradient per iteration... Table 2: Tuned hyper-parameters for linear model on synthetic dataset... Table 3: Tuned hyper-parameters for 1 layer feed-forward network on MNIST... Table 4: Tuned hyper-parameters for Le Net-5 on FEMNIST. |