Ultra-High Dimensional Single-Index Quantile Regression

Authors: Yuankun Zhang, Heng Lian, Yan Yu

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results of Monte Carlo simulations and an application to gene expression data demonstrate the effectiveness of the proposed models and estimation method. In Section 4, we conduct Monte Carlo simulations to evaluate the performance of the proposed method and apply the semiparametric model and the penalized estimation to a gene expression data set.
Researcher Affiliation Academia Yuankun Zhang EMAIL Department of Mathematical Sciences University of Cincinnati Cincinnati, OH 45221, USA. Heng Lian EMAIL Department of Mathematics City University of Hong Kong Hong Kong SAR. Yan Yu EMAIL Department of Operations, Business Analytics, and Information Systems University of Cincinnati Cincinnati, OH 45221, USA.
Pseudocode Yes In summary, the iterative algorithm we propose to use can be carried out as follows: Step 0. Initialize (bβ (0), bα(0)). Step 1. Given bβ (k 1), construct B-spline basis functions Π(z Tbβ (k 1)), then the spline coeffi- cient estimates are obtained from bθ (k) = arg min Pn i=1 ρτ yi Π(z T i bβ (k 1))Tθ x T i bα(k 1) . Step 2. Given the estimated spline coefficients bθ (k), the kth-step penalized estimator of the single-index parameters bβ (k) and partially linear parameters bα(k) will be achieved by the minimization of e Q(β, α) = i=1 ρτ(eyi ez T i β x T i α) + n j=1 p λ1(|bβ(k 1) j |)|βj| + n j=1 p λ2(|bα(k 1) j |)|αj|, (7) where eyi = yi Π(z T i bβ (k 1))Tbθ(k) + Π (z T i bβ (k 1))Tbθ(k)z T i bβ (k 1) and ezi = Π (z T i bβ (k 1))Tbθ(k)zi. Repeat Steps 1 and 2 until convergence.
Open Source Code No The paper does not explicitly provide a link to source code, state that code is released, or mention that code is included in supplementary materials.
Open Datasets Yes The gene expression data and phenotype data can be found at GEO (http://www.nci.nih.gov/geo; accession number GSE3330). They examined the genetics of two inbred mouse populations segregating for obesity and diabetes. The sample that was used to monitor the expression level of a total of 22,575 genes consists of a total of 60 subjects with approximately half male mice and half female mice. The experiment was conducted by Lan et al. (2006).
Dataset Splits Yes To compare the performance of different methods, we randomly split the original dataset into 5 testing data sets without replacement. Each testing data set contains 12 observations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No It is indeed a linear quantile regression and can be solved by many existing statistical software packages, for instance, the R quantreg package by Roger Koenker et. al. However, no specific version number for the 'quantreg' package or any other software is provided.
Experiment Setup Yes For the penalty function part, we select the tuning parameters λ by minimizing the high-dimensional BIC defined in (8). In addition, tuning parameter a in SCAD penalty is set to be 3.7 as suggested by Fan and Li (2001). To estimate the unknown function nonparametrically, cubic B-spline bases with equally-spaced knots are adopted throughout the empirical studies, and the number of interior knots is taken to be two, since we found this works well in our examples.