Distributed High-dimensional Regression Under a Quantile Loss Function
Authors: Xi Chen, Weidong Liu, Xiaojun Mao, Zhuoyi Yang
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The simulation analysis is provided to demonstrate the effectiveness of our method. Keywords: Distributed estimation, high-dimensional linear model, quantile loss, robust estimator, support recovery |
| Researcher Affiliation | Academia | Xi Chen EMAIL Stern School of Business New York University, New York, NY 10012, USA; Weidong Liu EMAIL School of Mathematical Sciences and Mo E Key Lab of Artificial Intelligence Shanghai Jiao Tong University, Shanghai, 200240, China; Xiaojun Mao EMAIL School of Data Science Fudan University, Shanghai, 200433, China; Zhuoyi Yang EMAIL Stern School of Business New York University, New York, NY 10012, USA |
| Pseudocode | Yes | Algorithm 1 Distributed high-dimensional QR estimator |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code developed for the methodology described. |
| Open Datasets | No | We consider the following linear model Yi = XT i β + ei, i = 1, 2, . . . , n, where XT i = (1, Xi,1, . . . , Xi,p) is a (p + 1)-dimensional covariate vector and (Xi,1, . . . , Xi,p)s are drawn i.i.d. from a multivariate normal distribution N(0, Σ). The paper uses synthetic data generated according to this model and does not provide access information for any public datasets. |
| Dataset Splits | No | The paper describes generating synthetic data and varying parameters like sample size (n) and local sample size (m) for simulations. While it discusses data distribution across 'L' machines, this relates to the distributed computing setup, not explicit training/validation/test splits of a specific dataset for reproducibility. For example, it states: "We fix the sample size n = 10000, local sample size m = 500, the sparsity level s = 20 and dimension p = 500." |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It only contains a general statement in the introduction: "For example, a personal computer usually has a limited memory size in GBs". |
| Software Dependencies | No | The paper mentions: "In our experiments, we adopt the PSSgb optimization method for solving (18)." and "To solve the ℓ1-regularized QR estimator, we formulate it into a standard linear programming problem (LP) and solve it by Gurobi (Gurobi Optimization, 2020), which is the state-of-the-art LP solver." While Gurobi is mentioned with a year, a specific version number is not provided, and PSSgb lacks any version information. Therefore, a fully reproducible description of ancillary software with specific version numbers is not present for all key components. |
| Experiment Setup | Yes | We fix the sample size n = 10000, local sample size m = 500, the sparsity level s = 20 and dimension p = 500. We plot the ℓ2-error from the true QR coefficients versus the number of iterations. Since the Avg-DC only requires one-shot communication, we use a horizontal line to show its performance. The results are shown in Figure 1. From the result, both pooled REL and distributed REL outperform the Avg-DC algorithm and become stable after a few iterations. Therefore, for the rest of the experiments in this section, we use 50 as the number of iterations in the algorithm. Moreover, the distributed REL almost matches the performance of pooled REL for all three noises. |