reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Sparse Regression via Penalization

Authors: Yao Ji, Gesualdo Scutari, Ying Sun, Harsha Honnappa

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide some experiments on synthetic and real data. The former are instrumental to validate our theoretical ﬁndings. More speciﬁcally, we validate the following theoretical results. (i) On the statistical error front, we show that with a proper choice of γ, the solution of the distributed formulation (4) achieves the statistical accuracy of the centralized LASSO estimator (Theorem 7). We also validate the dependency of γ with the dimension d [cf. (22)]. On the computational front, (ii) we demonstrate that DGD-like algorithm (28) displays linear convergence up to statistical precision; (iii) we also validate the scaling of the communication rounds with the network size m, as predicted by (57). (iv) Finally, we illustrate the speed-accuracy dilemma, as shown in Theorem 13.
Researcher Affiliation	Academia	Yao Ji EMAIL School of Industrial Engineering Purdue University West Lafayette, IN 47906, USA Gesualdo Scutari EMAIL School of Industrial Engineering Purdue University West Lafayette, IN 47906, USA Ying Sun EMAIL School of Electrical Engineering and Computer Science The Pennsylvania State University State College, PA 16802, USA Harsha Honnappa EMAIL School of Industrial Engineering Purdue University West Lafayette, IN 47906, USA
Pseudocode	No	The paper describes the algorithm in mathematical terms (e.g., equation 28 for the update rule) and refers to it as the 'proximal-gradient algorithm' or 'DGD-like algorithm', but it does not present a formally structured pseudocode or algorithm block with a dedicated label like 'Algorithm 1'.
Open Source Code	No	The paper provides a license for attribution ('CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/21-1333.html.') but does not contain any explicit statement or link indicating that the authors have released their own source code for the methodology described in the paper.
Open Datasets	Yes	We test our ﬁndings on the data set eyedata in the Normal Beta-Prime package (Bai and Ghosh, 2019).
Dataset Splits	Yes	We randomly divide the data set into training sample set with size Ntrain = 80 and test data set with size Ntest = 40. We partition the training data into m = 10 subsets.
Hardware Specification	Yes	All the experiments were run on a server equipped with Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz.
Software Dependencies	No	The paper mentions using the 'Normal Beta-Prime package (Bai and Ghosh, 2019)' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	Experimental setup (synthetic data): The ground truth θ is set by randomly sampling a multivariate Gaussian N(0, Id) and thresholding the smallest d s elements to zero. The noise vector w is assumed to be multivariate Gaussian N(0, 0.25IN). We construct X RN d by independently generating each row xi Rd, adopting the following procedure (Agarwal et al., 2012): let z1, . . . , zd 1 be i.i.d. standard normal random variables, set xi,1 = z1/ 1 0.252 and xi,t+1 = 0.25xi,t + zt, for t = 1, 2, . . . , d 1. It can be veriﬁed that all the eigenvalues of Σ = cov(xi) lie within the interval [0.64, 2.85]. We partition (X, y) as X = [X 1 , X 2 , . . . , X m] and y = [y 1 , . . . , y m] , and agent i owns the data set portion (Xi, yi) and we have m agents in total. We simulate an undirected graph G following the Erdös-Rényi model G(m, p), where m is the number of agents and p is the probability that an edge is independently included in the graph. The coeﬃcient of the matrix W are chosen according to the Lazy Metropolis rule (Olshevsky, 2017). All results are using Monte Carlo with 30 repetitions. 1) Statistical accuracy veriﬁcation (Theorem 7). We set N = 220, m = 20, d = 400, and consider two types of graphs, namely a fully connected and a weakly connected graph, the latter generated as Erdös-Rényi graph with edge probability p = 0.1, resulting in ρ 0.973.