Multi-Consensus Decentralized Accelerated Gradient Descent
Authors: Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments We evaluate the performance of our algorithms on (sparse) logistic regression with different settings, including the situation in which each fi(x) is strongly convex and the local function fi(x) may be non-convex. ... We compare our algorithm (Mudag) to centralized accelerated gradient descent (AGD) in (Nesterov, 2018), EXTRA in (Shi et al., 2015b), NIDS in (Li et al., 2019), Acc-DNGD in (Qu and Li, 2019) and APM-C in (Li et al., 2020b). |
| Researcher Affiliation | Academia | Haishan Ye EMAIL Center for Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University Xi an, China Luo Luo EMAIL School of Data Science Fudan University Shanghai, China Ziang Zhou EMAIL Department of Computing The Hong Kong Polytechnic University Hong Kong, China Tong Zhang EMAIL Computer Science & Mathematics The Hong Kong University of Science and Technology Hong Kong, China |
| Pseudocode | Yes | Algorithm 1 Mudag ... Algorithm 2 Fast Mix ... Algorithm 3 Prox Mudag |
| Open Source Code | No | The text does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | Yes | We conduct our experiments on a real-world dataset a9a which can be downloaded from LIBSVM repository (Chang and Lin, 2011). ... For w8a , we set n = 497 and d = 300. For a9a , we set n = 325 and d = 123. |
| Dataset Splits | No | The paper mentions characteristics of datasets like 'n = 325 and d = 123' for a9a and 'n = 497 and d = 300' for w8a, and describes how local objective functions fi(x) are defined across agents with different regularization parameters. However, it does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper evaluates performance based on 'Number of Gradient Computations' and 'Number of Communications' in the experimental section, but does not provide specific details regarding the hardware (e.g., CPU, GPU, or TPU models, memory, or cloud computing resources) used to run the experiments. |
| Software Dependencies | No | The paper refers to existing algorithms and datasets, such as LIBSVM and various optimization methods, but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, or specific solver versions) used in their experimental implementation. |
| Experiment Setup | Yes | In our experiments, we consider random networks where each pair of agents have a connection with a probability of p. We set W = I - L/λ1(L), where L is the Laplacian matrix associated with a weighted graph, and λ1(L) is the largest eigenvalue of L. We also set m = 100, that is, there exist 100 agents in this network. In our experiments, we run the algorithms on the settings of p = 0.1 and p = 0.5, which correspond to 1 - λ2(W) = 0.05 and 1 - λ2(W) = 0.81 respectively. ... We set σ1 = ... = σm = 10−3, then each fi(x) is strongly-convex. ... The step sizes of all algorithms are well-tuned to achieve their best performances. Furthermore, we set the momentum coefficient as ( L + µ ) / ( L - µ ) for Mudag, AGD and APM-C. We initialize x0 at 0 for all the compared methods. In the experiments, we set K = 1, K = 2 and K = 3 in Prox Mudag to evaluate how K affects the convergence behavior. |