Debiased Distributed Learning for Sparse Partial Linear Models in High Dimensions
Authors: Shaogao Lv, Heng Lian
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, some simulated experiments are carried out to illustrate the empirical performances of our debiased technique under the distributed setting. |
| Researcher Affiliation | Academia | Shaogao Lv EMAIL Department of Statistics and Data Science Nanjing Audit University Nanjing, China. Heng Lian EMAIL Department of Mathematics City University of Hong Kong Hong Kong, China |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about making code available, nor does it provide links to source code repositories. |
| Open Datasets | No | We generate the data from the model (1), where β = (1, 2, 1, 0.5, 2, 0, . . . , 0) and ϵi N(0, 4). We then generate a vector Zi in Rp from a mean-zero multivariate Gaussian distribution with correlations Cov(Zij, Zij ) = 0.3|j j |, 1 j, j p and then set Ti = Φ(Zi1) and Xij = Zij, j = 2, . . . , p, where Φ is the cumulative distribution function of the standard normal distribution so that Ti (0, 1). |
| Dataset Splits | No | The paper describes how the total data N is randomly allocated to m machines for distributed processing, and specifies various N and m values in the simulations (e.g., N=2000, m=10). However, it does not provide specific training, test, or validation dataset splits for model evaluation. |
| Hardware Specification | No | The simulations are carried out on the computational cluster Katana in the University of New South Wales. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We select the tuning parameters in the penalties by 5-fold cross-validation in each local machine. We set N = 2000, m = 1, 10 (m = 1 is the centralized estimator) and p = 100, 200, 400, 800, 1600. We generate 200 data sets for each setting. |