reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prediction Risk for the Horseshoe Regression

Authors: Anindya Bhadra, Jyotishka Datta, Yunfan Li, Nicholas G. Polson, Brandon Willard

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical demonstrations of improved prediction over competing approaches in simulations and in a pharmacogenomics data set conﬁrm our theoretical ﬁndings. Keywords: Global-local Priors, Principal Components, Shrinkage Regression, Stein s Unbiased Risk Rstimate
Researcher Affiliation	Academia	Anindya Bhadra EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA Jyotishka Datta EMAIL Department of Mathematical Sciences University of Arkansas Fayetteville, AR 72701, USA Yunfan Li EMAIL Department of Statistics Purdue University West Lafayette, IN 47907, USA Nicholas G. Polson EMAIL Booth School of Business University of Chicago Chicago, IL 60637, USA Brandon Willard EMAIL Booth School of Business University of Chicago Chicago, IL 60637, USA
Pseudocode	No	The paper describes theoretical results, theorems, and mathematical derivations (e.g., Theorem 4.1, Theorem 5.1) but does not include any explicitly labeled pseudocode blocks or algorithms with structured steps.
Open Source Code	No	The paper mentions 'Supplementary Material to Prediction risk for the horseshoe regression' for additional simulations but does not explicitly state that the source code for the described methodology is released or provide a link to a code repository.
Open Datasets	Yes	The data were originally described by Szak acs et al. (2004), in which the authors studied 60 cancer cell lines in the publicly available NCI-60 database (https://dtp.cancer.gov/discovery development/nci60/).
Dataset Splits	Yes	To test the performance of the methods, we split each data set into training and testing sets, with 75% (45 out of 60) of the observations in the training sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper discusses various regression methods like 'ridge regression (RR)', 'the lasso regression (LASSO)', 'principal components regression (PCR)', 'the horseshoe regression (HS)', 'adaptive lasso', 'minimax concave penalty (MCP)', and 'smoothly clipped absolute deviation (SCAD)' but does not specify any software libraries or frameworks with their version numbers.
Experiment Setup	Yes	We simulate data where n = 100, and consider the cases p = 100, 200, 300, 400, 500. Let B be a p k factor loading matrix, with all entries equal to 1. Let Fi be k 1 matrix of factor values, with all entries drawn independently from N(0, 1). The ith row of the n p design matrix X is generated by a factor model, with number of factors k = 8, as follows: Xi = BFi + ξi, ξi N(0, 0.1), for i = 1, . . . , n. ... The observations y are generated from Equation (3) with σ2 = 1, where for the true orthogonalized regression coeﬃcients α0, the 6, 30, 57, 67, and 96th components are randomly selected as signals, and the remaining 95 components are noise terms. Coeﬃcients of the signals are generated by a N(10, 0.5) distribution, and coeﬃcients of the noise terms are generated by a N(0, 0.5) distribution. ... The tuning parameters in ridge regression, the lasso, the adaptive lasso, SCAD and MCP are chosen by ﬁve-fold cross validation on the training data. Similarly, the number of components in PCR and the global shrinkage parameter τ for horseshoe regression are chosen by cross validation as well.