reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ultra-High Dimensional Single-Index Quantile Regression

Authors: Yuankun Zhang, Heng Lian, Yan Yu

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results of Monte Carlo simulations and an application to gene expression data demonstrate the eﬀectiveness of the proposed models and estimation method. In Section 4, we conduct Monte Carlo simulations to evaluate the performance of the proposed method and apply the semiparametric model and the penalized estimation to a gene expression data set.
Researcher Affiliation	Academia	Yuankun Zhang EMAIL Department of Mathematical Sciences University of Cincinnati Cincinnati, OH 45221, USA. Heng Lian EMAIL Department of Mathematics City University of Hong Kong Hong Kong SAR. Yan Yu EMAIL Department of Operations, Business Analytics, and Information Systems University of Cincinnati Cincinnati, OH 45221, USA.
Pseudocode	Yes	In summary, the iterative algorithm we propose to use can be carried out as follows: Step 0. Initialize (bβ (0), bα(0)). Step 1. Given bβ (k 1), construct B-spline basis functions Π(z Tbβ (k 1)), then the spline coeﬃ- cient estimates are obtained from bθ (k) = arg min Pn i=1 ρτ yi Π(z T i bβ (k 1))Tθ x T i bα(k 1) . Step 2. Given the estimated spline coeﬃcients bθ (k), the kth-step penalized estimator of the single-index parameters bβ (k) and partially linear parameters bα(k) will be achieved by the minimization of e Q(β, α) = i=1 ρτ(eyi ez T i β x T i α) + n j=1 p λ1(\|bβ(k 1) j \|)\|βj\| + n j=1 p λ2(\|bα(k 1) j \|)\|αj\|, (7) where eyi = yi Π(z T i bβ (k 1))Tbθ(k) + Π (z T i bβ (k 1))Tbθ(k)z T i bβ (k 1) and ezi = Π (z T i bβ (k 1))Tbθ(k)zi. Repeat Steps 1 and 2 until convergence.
Open Source Code	No	The paper does not explicitly provide a link to source code, state that code is released, or mention that code is included in supplementary materials.
Open Datasets	Yes	The gene expression data and phenotype data can be found at GEO (http://www.nci.nih.gov/geo; accession number GSE3330). They examined the genetics of two inbred mouse populations segregating for obesity and diabetes. The sample that was used to monitor the expression level of a total of 22,575 genes consists of a total of 60 subjects with approximately half male mice and half female mice. The experiment was conducted by Lan et al. (2006).
Dataset Splits	Yes	To compare the performance of diﬀerent methods, we randomly split the original dataset into 5 testing data sets without replacement. Each testing data set contains 12 observations.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	It is indeed a linear quantile regression and can be solved by many existing statistical software packages, for instance, the R quantreg package by Roger Koenker et. al. However, no specific version number for the 'quantreg' package or any other software is provided.
Experiment Setup	Yes	For the penalty function part, we select the tuning parameters λ by minimizing the high-dimensional BIC deﬁned in (8). In addition, tuning parameter a in SCAD penalty is set to be 3.7 as suggested by Fan and Li (2001). To estimate the unknown function nonparametrically, cubic B-spline bases with equally-spaced knots are adopted throughout the empirical studies, and the number of interior knots is taken to be two, since we found this works well in our examples.