reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Zero-Order One-Point Gradient Estimate in Consensus-Based Distributed Stochastic Optimization

Authors: Elissa Mhanna, Mohamad Assaad

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide numerical examples to illustrate the performance of the algorithm 1P-DSG. We compare it with FO distributed methods aiming to achieve consensus, namely DSGT (Pu & Nedić, 2018) and EXTRA (Shi et al., 2015), a ZO distributed algorithm denoted 2P-DSG based on the two-point estimate in (2) (Tang et al., 2021), and a ZO centralized algorithm based on gradient descent (e.g. Flaxman et al. (2004) and Bach & Perchet (2016)) using another one-point estimate which is presented in (1). We denote the ZO centralized algorithm by 1P-GD. ... We consider a logistic classiﬁcation problem to classify m images of the two digits, labeled as yij = +1 or 1 from the MNIST data set (Le Cun & Cortes, 2005).
Researcher Affiliation	Academia	Elissa Mhanna EMAIL Mohamad Assaad EMAIL Laboratoire des Signaux & Systèmes Centrale Supélec 3 rue Joliot Curie 91190 Gif-sur-Yvette, France
Pseudocode	Yes	2.1 The 1P-DSG Algorithm We consider a zero-order distributed stochastic gradient algorithm aiming for consensus with a one-point estimate. We denote it as 1P-DSG employing the gradient estimate gi,k in (6). Every agent i initializes its variables with an arbitrary valued vector xi,0 K and computes gi,0 at that variable. Then, at each time k N, agent i updates its variables independently according to the following steps: j=1 wij(xj,k αkgj,k) xi,k+1 = ΠK(zi,k+1) perform the action: xi,k+1 + γk+1Φi,k+1 where αk > 0 is a step size. Algorithm (7) can then be written in the following compact matrix form for clarity of analysis: zk+1 = W(xk αkgk) xk+1 = [x1,k+1, x2,k+1, . . . , xn,k+1]T perform the action: xk+1 + γk+1Φk+1 where Φk Rn d is deﬁned as Φk = [Φ1,k, Φ2,k, . . . , Φn,k]T .
Open Source Code	No	No explicit statement regarding the release of source code for this paper's methodology was found.
Open Datasets	Yes	We consider a logistic classiﬁcation problem to classify m images of the two digits, labeled as yij = +1 or 1 from the MNIST data set (Le Cun & Cortes, 2005).
Dataset Splits	Yes	In Figure 4, we measure at every iteration the classiﬁcation accuracy against an independent test set of 2167 images using the updated mean vector θk = 1/n Pn i=1 θi,k of the local decision variables.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We take σu = 0.01 and let αk = 0.05(k + 1) 0.75 and γk = 0.8(k + 1) 0.25 for 1P-DSG with vanishing step sizes, and α = 0.05 and γ = 0.6 with constant step sizes. We choose Φk { 1 d}d with equal probability. Also, every function query is subject to a white noise generated by the standard normal distribution. For the DSGT algorithm, we set the step size to αk = 0.015(k + 1) 1 when it is vanishing and α = 0.015 when constant, and we do not consider the perturbation on the objective function nor the noise on the objective function, only the noise on the exact gradient. Similarly for EXTRA and we set its step size to α = 0.01. For 2P-DSG, we consider αk = 0.01(k + 1) 0.75 and γk = 0.01(k + 1) 0.25. For the centralized 1P-GD algorithm, we set α = 0.005 and γ = 0.5 (α = 0.03 and γ = 0.6 with estimator (6)). We let c = 0.1, and the initialization be the same for all algorithms, with θi,0 uniformly chosen from [ 0.5, 0.5]d, i N, per instance. We ﬁnally average the simulations over 30 instances.