Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction
Authors: Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency. Our work suggests that by performing a judiciously chosen amount of local communication and computation per iteration, the overall efficiency can be substantially improved. Keywords: decentralized optimization, federated learning, communication efficiency, gradient tracking, variance reduction |
| Researcher Affiliation | Academia | Boyue Li, Shicong Cen EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. Yuxin Chen EMAIL Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA. Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA |
| Pseudocode | Yes | Algorithm 1 Network-DANE Algorithm 2 Network-DANE for nonsmooth composite optimization Algorithm 3 Network-SVRG/SARAH Algorithm 4 Network-DANE for quadratic losses (8) |
| Open Source Code | Yes | Code for our experiments can be found at https://github.com/liboyue/Network-Distributed-Algorithm/tree/JMLR. |
| Open Datasets | Yes | Binary classification using logistic regression. We use regularized logistic regression to solve a binary classification problem using the Gisette dataset. Neural network training... using the MNIST dataset. |
| Dataset Splits | Yes | We split the Gisette dataset to n = 20 agents, where each agent receives m = 300 training samples of dimension d = 5000. We split 60,000 training samples to 20 agents and use an Erdös Rènyi graph with p = 0.3 for communications. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) were provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned in the paper. |
| Experiment Setup | Yes | For DANE and Network-DANE, we set µ = 5 · 10−10 when κ = 10 and µ = 5 · 10−4 when κ = 104. For Network-SVRG/SARAH, we set the step size δ = 0.1/(L + σ + 2µ), the number of local iterations S = 0.05m. To solve the local optimization subproblems, we use Nesterov’s accelerated gradient descent for at most 100 iterations for DANE and Network-DANE. |