Communication-efficient Sparse Regression

Authors: Jason D. Lee, Qiang Liu, Yuekai Sun, Jonathan E. Taylor

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical results with simulations. First, we study the estimation error of the averaged debiased lasso in ℓ norm. To focus on the effect of averaging, we grow the number of machines m linearly with the (total) sample size N. In other words, we fix the sample size per machine n and grow the total sample size N by adding machines. The tuning parameters were set to their oracle values stated in the Theorem 12. Figure 1 compares the estimation error (in ℓ norm) of the averaged debiased lasso estimator with that of the centralized lasso. We see the estimation error of the averaged debiased lasso estimator is comparable to that of the centralized lasso, while that of the naive averaged lasso is much worse. We conduct a second set of simulations to study the effect of the number of machines on the estimation effort of the averaged estimator. To focus on the effect of the number of machines k, we fix the (total) sample size N and vary the number of machines the samples are distributed across. The tuning parameters were again set to the oracle values stated in the Theorem 12. Figure 2 shows how the estimation error (in ℓ norm) of the averaged estimator grows as the number of machines grows. When the number of machines is small, the estimation error of the averaged estimator is comparable to that of the centralized lasso. However, when the number of machines exceeds a certain threshold, the estimation error grows with the number of machines.
Researcher Affiliation Academia Jason D. Lee Jason EMAIL Marshall School of Business University of Southern California Los Angeles, CA 90089 Qiang Liu EMAIL Department of Computer Science Dartmouth University Hanover, NH 02714 Yuekai Sun EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109 Jonathan E. Taylor EMAIL Department of Statistics Stanford University Stanford, CA 94305
Pseudocode No The paper describes methods and formulas mathematically, such as in equations (1), (2), (3), (4), (5), (6), (7), (8) and provides detailed theoretical derivations and proofs. However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor are there any structured code-like procedures presented.
Open Source Code No In practice, it is possible to set the tuning parameter δ by the bisection method in the accompanying code to Javanmard and Montanari (2013a); via bisection, they search for the smallest δ such that the optimization program in (2) is feasible. This sentence refers to code from another paper (Javanmard and Montanari (2013a)), not code released by the authors for the methodology described in this paper.
Open Datasets No We validate our theoretical results with simulations. First, we study the estimation error of the averaged debiased lasso in ℓ norm. To focus on the effect of averaging, we grow the number of machines m linearly with the (total) sample size N. In other words, we fix the sample size per machine n and grow the total sample size N by adding machines. The paper uses simulated/synthetic data for its experiments and does not provide access information for any public or open datasets.
Dataset Splits No Recall the problem setup: we are given N samples of the form (xi, yi) distributed across m machines: The k-th machine has local predictors Xk Rnk p and responses yk Rnk. To keep things simple, we assume the data is evenly distributed, i.e. n1 = = nk = n = N m. The paper describes the distribution of generated data for simulation but does not specify training/test/validation splits for a pre-existing dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used to run its simulations or experiments.
Software Dependencies No For the purpose of comparison, let us assume that the lasso solver performs T iterations. Thus the complexity of solving a lasso in dimension p with n samples is O(np T). In the simple approach of Section 2 where each machine computes its own ˆΘ, the parallel runtime is O(np2T). However using the approach of Section 3, each machine is only computing p/m rows of ˆΘ. This brings down the parallel runtime to O(np2T m ). In comparison, the cost of solving the lasso using a state-of-art optimization method such as Agarwal et al. (2012) has parallel runtime O(np T), so our computational cost is larger by a factor of O( p m). The paper refers to general 'lasso solver' and 'state-of-art optimization method' but does not specify any software names with version numbers.
Experiment Setup Yes The tuning parameters were set to their oracle values stated in the Theorem 12. Figure 1 compares the estimation error (in ℓ norm) of the averaged debiased lasso estimator with that of the centralized lasso. We see the estimation error of the averaged debiased lasso estimator is comparable to that of the centralized lasso, while that of the naive averaged lasso is much worse... The lasso tuning parameter λ is set by first estimating the noise variance using the residuals and then using the formula λ = σ 2 log p.