Prediction Risk for the Horseshoe Regression

Authors: Anindya Bhadra, Jyotishka Datta, Yunfan Li, Nicholas G. Polson, Brandon Willard

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical demonstrations of improved prediction over competing approaches in simulations and in a pharmacogenomics data set confirm our theoretical findings. Keywords: Global-local Priors, Principal Components, Shrinkage Regression, Stein s Unbiased Risk Rstimate
Researcher Affiliation Academia Anindya Bhadra EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA Jyotishka Datta EMAIL Department of Mathematical Sciences University of Arkansas Fayetteville, AR 72701, USA Yunfan Li EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA Nicholas G. Polson EMAIL Booth School of Business University of Chicago Chicago, IL 60637, USA Brandon Willard EMAIL Booth School of Business University of Chicago Chicago, IL 60637, USA
Pseudocode No The paper describes theoretical results, theorems, and mathematical derivations (e.g., Theorem 4.1, Theorem 5.1) but does not include any explicitly labeled pseudocode blocks or algorithms with structured steps.
Open Source Code No The paper mentions 'Supplementary Material to Prediction risk for the horseshoe regression' for additional simulations but does not explicitly state that the source code for the described methodology is released or provide a link to a code repository.
Open Datasets Yes The data were originally described by Szak acs et al. (2004), in which the authors studied 60 cancer cell lines in the publicly available NCI-60 database (https://dtp.cancer.gov/discovery development/nci60/).
Dataset Splits Yes To test the performance of the methods, we split each data set into training and testing sets, with 75% (45 out of 60) of the observations in the training sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper discusses various regression methods like 'ridge regression (RR)', 'the lasso regression (LASSO)', 'principal components regression (PCR)', 'the horseshoe regression (HS)', 'adaptive lasso', 'minimax concave penalty (MCP)', and 'smoothly clipped absolute deviation (SCAD)' but does not specify any software libraries or frameworks with their version numbers.
Experiment Setup Yes We simulate data where n = 100, and consider the cases p = 100, 200, 300, 400, 500. Let B be a p k factor loading matrix, with all entries equal to 1. Let Fi be k 1 matrix of factor values, with all entries drawn independently from N(0, 1). The ith row of the n p design matrix X is generated by a factor model, with number of factors k = 8, as follows: Xi = BFi + ξi, ξi N(0, 0.1), for i = 1, . . . , n. ... The observations y are generated from Equation (3) with σ2 = 1, where for the true orthogonalized regression coefficients α0, the 6, 30, 57, 67, and 96th components are randomly selected as signals, and the remaining 95 components are noise terms. Coefficients of the signals are generated by a N(10, 0.5) distribution, and coefficients of the noise terms are generated by a N(0, 0.5) distribution. ... The tuning parameters in ridge regression, the lasso, the adaptive lasso, SCAD and MCP are chosen by five-fold cross validation on the training data. Similarly, the number of components in PCR and the global shrinkage parameter τ for horseshoe regression are chosen by cross validation as well.