Ultra-High Dimensional Single-Index Quantile Regression
Authors: Yuankun Zhang, Heng Lian, Yan Yu
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of Monte Carlo simulations and an application to gene expression data demonstrate the effectiveness of the proposed models and estimation method. In Section 4, we conduct Monte Carlo simulations to evaluate the performance of the proposed method and apply the semiparametric model and the penalized estimation to a gene expression data set. |
| Researcher Affiliation | Academia | Yuankun Zhang EMAIL Department of Mathematical Sciences University of Cincinnati Cincinnati, OH 45221, USA. Heng Lian EMAIL Department of Mathematics City University of Hong Kong Hong Kong SAR. Yan Yu EMAIL Department of Operations, Business Analytics, and Information Systems University of Cincinnati Cincinnati, OH 45221, USA. |
| Pseudocode | Yes | In summary, the iterative algorithm we propose to use can be carried out as follows: Step 0. Initialize (bβ (0), bα(0)). Step 1. Given bβ (k 1), construct B-spline basis functions Π(z Tbβ (k 1)), then the spline coeffi- cient estimates are obtained from bθ (k) = arg min Pn i=1 ρτ yi Π(z T i bβ (k 1))Tθ x T i bα(k 1) . Step 2. Given the estimated spline coefficients bθ (k), the kth-step penalized estimator of the single-index parameters bβ (k) and partially linear parameters bα(k) will be achieved by the minimization of e Q(β, α) = i=1 ρτ(eyi ez T i β x T i α) + n j=1 p λ1(|bβ(k 1) j |)|βj| + n j=1 p λ2(|bα(k 1) j |)|αj|, (7) where eyi = yi Π(z T i bβ (k 1))Tbθ(k) + Π (z T i bβ (k 1))Tbθ(k)z T i bβ (k 1) and ezi = Π (z T i bβ (k 1))Tbθ(k)zi. Repeat Steps 1 and 2 until convergence. |
| Open Source Code | No | The paper does not explicitly provide a link to source code, state that code is released, or mention that code is included in supplementary materials. |
| Open Datasets | Yes | The gene expression data and phenotype data can be found at GEO (http://www.nci.nih.gov/geo; accession number GSE3330). They examined the genetics of two inbred mouse populations segregating for obesity and diabetes. The sample that was used to monitor the expression level of a total of 22,575 genes consists of a total of 60 subjects with approximately half male mice and half female mice. The experiment was conducted by Lan et al. (2006). |
| Dataset Splits | Yes | To compare the performance of different methods, we randomly split the original dataset into 5 testing data sets without replacement. Each testing data set contains 12 observations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | It is indeed a linear quantile regression and can be solved by many existing statistical software packages, for instance, the R quantreg package by Roger Koenker et. al. However, no specific version number for the 'quantreg' package or any other software is provided. |
| Experiment Setup | Yes | For the penalty function part, we select the tuning parameters λ by minimizing the high-dimensional BIC defined in (8). In addition, tuning parameter a in SCAD penalty is set to be 3.7 as suggested by Fan and Li (2001). To estimate the unknown function nonparametrically, cubic B-spline bases with equally-spaced knots are adopted throughout the empirical studies, and the number of interior knots is taken to be two, since we found this works well in our examples. |