DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample Complexity

Authors: Chung-Yiu Yau, Hoi To Wai

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments demonstrate that our algorithm outperforms several state-of-the-art algorithms in practice. We empirically evaluate the performance of Do Co M on training linear models and deep learning models using synthetic and real data, on non-convex losses.
Researcher Affiliation Academia Chung-Yiu Yau EMAIL The Chinese University of Hong Kong. Hoi-To Wai EMAIL The Chinese University of Hong Kong.
Pseudocode Yes Algorithm 1 Do Co M Algorithm
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the Do Co M algorithm, nor does it provide a link to a code repository.
Open Datasets Yes Synthetic Data with Linear Model Consider a set of synthetic data generated with the leaf benchmarking framework (Caldas et al., 2019)... MNIST Data with Feed-forward Network ...on the MNIST dataset... FEMNIST Data with Le Net-5 ...on the FEMNIST dataset.
Dataset Splits No The paper describes how data is partitioned among agents (e.g., 'm = 1443 samples partitioned into n = 25 non-i.i.d. portions', 'samples are partitioned into n = 10 agents where each agent only gets 1 class of samples'), but it does not specify conventional training, validation, or test dataset splits in terms of percentages or absolute counts.
Hardware Specification Yes We run the decentralized optimization algorithms on a 40 threads Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz server with MPI-enabled Py Torch and evaluate the performance of trained models on a Tesla K80 GPU server.
Software Dependencies No The paper mentions 'MPI-enabled Py Torch' but does not provide specific version numbers for PyTorch, MPI, or any other software dependencies.
Experiment Setup Yes For all algorithms we choose the learning rate η from {0.1, 0.01, 0.001}, and fix the regularization parameter as λ = 10 4 ... For Do Co M and GT-HSGD, we choose the best momentum parameter β in {0.0001, 0.001, 0.01, 0.1, 0.5, 0.9} and fix the initial batch number as b0,i = mi. We choose the batch sizes such that all algorithms spend the same amount of computation on stochastic gradient per iteration... Table 2: Tuned hyper-parameters for linear model on synthetic dataset... Table 3: Tuned hyper-parameters for 1 layer feed-forward network on MNIST... Table 4: Tuned hyper-parameters for Le Net-5 on FEMNIST.