Multi-Consensus Decentralized Accelerated Gradient Descent

Authors: Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments We evaluate the performance of our algorithms on (sparse) logistic regression with different settings, including the situation in which each fi(x) is strongly convex and the local function fi(x) may be non-convex. ... We compare our algorithm (Mudag) to centralized accelerated gradient descent (AGD) in (Nesterov, 2018), EXTRA in (Shi et al., 2015b), NIDS in (Li et al., 2019), Acc-DNGD in (Qu and Li, 2019) and APM-C in (Li et al., 2020b).
Researcher Affiliation Academia Haishan Ye EMAIL Center for Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University Xi an, China Luo Luo EMAIL School of Data Science Fudan University Shanghai, China Ziang Zhou EMAIL Department of Computing The Hong Kong Polytechnic University Hong Kong, China Tong Zhang EMAIL Computer Science & Mathematics The Hong Kong University of Science and Technology Hong Kong, China
Pseudocode Yes Algorithm 1 Mudag ... Algorithm 2 Fast Mix ... Algorithm 3 Prox Mudag
Open Source Code No The text does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code in supplementary materials.
Open Datasets Yes We conduct our experiments on a real-world dataset a9a which can be downloaded from LIBSVM repository (Chang and Lin, 2011). ... For w8a , we set n = 497 and d = 300. For a9a , we set n = 325 and d = 123.
Dataset Splits No The paper mentions characteristics of datasets like 'n = 325 and d = 123' for a9a and 'n = 497 and d = 300' for w8a, and describes how local objective functions fi(x) are defined across agents with different regularization parameters. However, it does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper evaluates performance based on 'Number of Gradient Computations' and 'Number of Communications' in the experimental section, but does not provide specific details regarding the hardware (e.g., CPU, GPU, or TPU models, memory, or cloud computing resources) used to run the experiments.
Software Dependencies No The paper refers to existing algorithms and datasets, such as LIBSVM and various optimization methods, but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, or specific solver versions) used in their experimental implementation.
Experiment Setup Yes In our experiments, we consider random networks where each pair of agents have a connection with a probability of p. We set W = I - L/λ1(L), where L is the Laplacian matrix associated with a weighted graph, and λ1(L) is the largest eigenvalue of L. We also set m = 100, that is, there exist 100 agents in this network. In our experiments, we run the algorithms on the settings of p = 0.1 and p = 0.5, which correspond to 1 - λ2(W) = 0.05 and 1 - λ2(W) = 0.81 respectively. ... We set σ1 = ... = σm = 10−3, then each fi(x) is strongly-convex. ... The step sizes of all algorithms are well-tuned to achieve their best performances. Furthermore, we set the momentum coefficient as ( L + µ ) / ( L - µ ) for Mudag, AGD and APM-C. We initialize x0 at 0 for all the compared methods. In the experiments, we set K = 1, K = 2 and K = 3 in Prox Mudag to evaluate how K affects the convergence behavior.