Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Authors: Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency. Our work suggests that by performing a judiciously chosen amount of local communication and computation per iteration, the overall efficiency can be substantially improved. Keywords: decentralized optimization, federated learning, communication efficiency, gradient tracking, variance reduction
Researcher Affiliation Academia Boyue Li, Shicong Cen EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA. Yuxin Chen EMAIL Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA. Yuejie Chi EMAIL Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA
Pseudocode Yes Algorithm 1 Network-DANE Algorithm 2 Network-DANE for nonsmooth composite optimization Algorithm 3 Network-SVRG/SARAH Algorithm 4 Network-DANE for quadratic losses (8)
Open Source Code Yes Code for our experiments can be found at https://github.com/liboyue/Network-Distributed-Algorithm/tree/JMLR.
Open Datasets Yes Binary classification using logistic regression. We use regularized logistic regression to solve a binary classification problem using the Gisette dataset. Neural network training... using the MNIST dataset.
Dataset Splits Yes We split the Gisette dataset to n = 20 agents, where each agent receives m = 300 training samples of dimension d = 5000. We split 60,000 training samples to 20 agents and use an Erdös Rènyi graph with p = 0.3 for communications.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) were provided in the paper.
Software Dependencies No No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup Yes For DANE and Network-DANE, we set µ = 5 · 10−10 when κ = 10 and µ = 5 · 10−4 when κ = 104. For Network-SVRG/SARAH, we set the step size δ = 0.1/(L + σ + 2µ), the number of local iterations S = 0.05m. To solve the local optimization subproblems, we use Nesterov’s accelerated gradient descent for at most 100 iterations for DANE and Network-DANE.