Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization
Authors: Huan Li, Zhouchen Lin
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we test the performance of the accelerated gradient tracking (Acc-GT) over time-varying graphs. The performance of Acc-GT over static graphs has already been verified in (Qu and Li, 2020). Moreover, Qu and Li (2020) reported in their experiment that algorithm (12a)-(12d) with fixed step size (our theoretical setting) performs faster than the one with vanishing step sizes (their theoretical setting). Thus, we omit the comparisons over static graphs. We consider the following decentralized regularized logistic regression problem: i=1 f(i)(x), where f(i)(x) = µ j=1 log 1 + exp( y(i),j A (i),jx) , where (A(i),j, y(i),j) Rp {1, 1} is the data point with A(i),j being the feature vector, and y(i),j the label. We use the cifar10 dataset with p = 3072, n = 50, and m = 1000. Each feature vector is normalized to have unit norm, and the data are divided into two classes to fit the logistic regression model. We observe that L = maxi A(i) 2 2 4n 0.215. We consider both strongly convex (µ = 10 6) and nonstrongly convex (µ = 0) problems. We test the performance on the 2D grid graphs, where at each iteration, m nodes are uniformly placed in a 5 m 5 m region in random, and each node is connected with the nodes around it within the distance of d. We test on d = 20 and d = 2, which correspond to (γ, σγ) (1, 0.9858) and (γ, σγ) (32, 0.9471), respectively. When d = 20, the network is connected almost every time. When d = 2, we observe that at each iteration, almost 61 percent of the nodes drop out from the communication network in average, which means that they have no connection with the other nodes. We use the Metropolis gossip matrix given in (8). For strongly convex problem, we compare Acc-GT and Acc-GT-C (Acc-GT with multiple consensus) with DIGing (Nedi c et al., 2017), DAGD-C (Rogozin et al., 2021b), as well as the classical non-distributed accelerated gradient descent (AGD), where AGD runs on a single machine, and it gives the upper limit of the practical performance of the distributed algorithms. [...] Figures 1-4 plot the results, where the objective function error is measured by F(xk) F(x ), and the consensus error is measured by r Pm i=1 xk (i) xk 2 m xk 2 . |
| Researcher Affiliation | Academia | Huan Li EMAIL Institute of Robotics and Automatic Information Systems College of Artificial Intelligence Nankai University Tianjin 300071, China; Zhouchen Lin B EMAIL State Key Lab of General AI, School of Intelligence Science and Technology, Peking University Institute for Artificial Intelligence, Peking University, Beijing 100871, China Pazhou Laboratory (Huangpu), Guangzhou 510555, China |
| Pseudocode | Yes | Algorithm 1 Accelerated Gradient Tracking (Acc-GT) Initialize: x0 (i) = y0 (i) = z0 (i) = xint, s0 (i) = f(i)(y0 (i)), z1 (i) = P j N(i) W 0 ijz0 (j) α θ0+µαs0 (i), and x1 (i) = θ0z1 (i) + (1 θ0) P j N(i) W 0 ijx0 (j). for k = 1, 2, ... do yk (i) = θkzk (i) + (1 θk)xk (i), sk (i) = P j N(i) W k ijsk 1 (j) + f(i)(yk (i)) f(i)(yk 1 (i) ), zk+1 (i) = 1 1 + µα P j N(i) W k ij θk yk (j) + zk (j) α θk sk (i) , xk+1 (i) = θkzk+1 (i) + (1 θk) X j N(i) W k ijxk (j). In this paper, we study the following accelerated gradient tracking with time-varying gossip matrices: yk = θkzk + (1 θk)xk, (13a) sk = W ksk 1 + f(yk) f(yk 1), (13b) zk+1 = 1 1 + µα θk yk + zk α θk sk , (13c) xk+1 = θkzk+1 + (1 θk)W kxk, (13d) where we initialize x0 such that Πx0 = 0. We give the specific descriptions of the method in Algorithm 1 in a distributed way. |
| Open Source Code | No | The paper does not contain an unambiguous statement of code release or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | We use the cifar10 dataset with p = 3072, n = 50, and m = 1000. Each feature vector is normalized to have unit norm, and the data are divided into two classes to fit the logistic regression model. |
| Dataset Splits | No | The paper states that "the data are divided into two classes to fit the logistic regression model" but does not specify how the dataset is split into training, validation, or test sets, nor does it provide percentages or sample counts for such splits. It mentions using 'cifar10 dataset' which typically has standard splits, but these are not explicitly stated or referenced in the paper. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments. It makes a general statement about 'CPU speed' versus 'communication speed' but no models or specifications of the experimental hardware are mentioned. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) used to replicate the experiments. |
| Experiment Setup | Yes | We use the cifar10 dataset with p = 3072, n = 50, and m = 1000. Each feature vector is normalized to have unit norm, and the data are divided into two classes to fit the logistic regression model. We observe that L = maxi A(i) 2 2 4n 0.215. We consider both strongly convex (µ = 10 6) and nonstrongly convex (µ = 0) problems. We tune the step sizes α = 0.1 / L for Acc-GT and Acc-GT-C, α = 0.5 / L for DIGing, and α = 1 / L for AGD. For DAGD-C, when d = 2, we test on the number of inner iterations sa T = γ 3(1 σγ) 201 and T = γ 2(1 σγ) 302, and name the methods DAGD-C1 and DAGD-C2, respectively. When d = 20, we test on T = γ 5(1 σγ) 14 and T = γ 4(1 σγ) 17, respectively. For Acc-GT-C, we set the number of inner iterations as T = γ 50(1 σγ) 12 and T = γ 10(1 σγ) 7 for d = 2 and d = 20, respectively. The other parameter settings follow the corresponding theorems of each method. For nonstrongly convex problem, we compare Acc-GT and Acc-GT-C with DIGing (Nedi c et al., 2017), APM (Li et al., 2020a), and AGD, and set the same step sizes as above. We tune the step size α = 1 / L for APM, and set the number of inner iterations as Tk = γ log(k+1) / 100(1 σγ) and Tk = γ log(k+1) / 10(1 σγ) at each outer loop iteration for d = 2 and d = 20, respectively. |