reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Cluster Elastic Net for Multivariate Regression

Authors: Bradley S. Price, Ben Sherwood

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations and data examples from business operations and genomics are presented to show the merits of both the least squares and binomial methods.
Researcher Affiliation	Academia	Bradley S. Price EMAIL College of Business and Economics West Virginia University Morgantown, WV 26505, USA; Ben Sherwood EMAIL School of Business University of Kansas Lawrence, KS 66045, USA
Pseudocode	Yes	We propose a two-step iterative procedure to obtain a local minimum. 1. Begin with initial estimates, ˆβ 1 1, . . . , ˆβ 1 r. 2. For the wth step, where w > 1, repeat the steps below until the group estimates do not change: (a) Hold ˆBw 1 ﬁxed and minimize, ˆDw 1 , . . . , ˆDw Q = minimize D1,...,DQ X ˆβ w 1 l ˆβ w 1 m 2 The above can be solved by performing K-means clustering on the r n dimensional vectors X ˆβ w 1 1 , . . . , X ˆβ w 1 r . (b) Holding ˆDw 1 , . . . , ˆDw Q ﬁxed the wth estimate of B is ˆBw = arg min B Rp r 1 2n c=1 (yic x T i βc)2 + δ\|\|B\|\|1 \|\|X(βl βm)\|\|2 2.
Open Source Code	Yes	The mcen R package that implements the methods outlined in this article is available on CRAN (Sherwood and Price, 2018).
Open Datasets	Yes	Votavova et al. (2011) collected gene expression proﬁles, demographic and birth information from 72 pregnant mothers.
Dataset Splits	Yes	To evaluate the methods we randomly partitioned the data into 50 training and 15 testing samples. We divide 2000 transactions into training and validation sets. The ﬁrst 1000 transactions are used to train our models, with 3-fold cross validation used to select the tuning parameters for both MCEN and SEN. The predictive performance of the models are then compared using the next 1000 transactions.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments, such as GPU/CPU models or memory amounts.
Software Dependencies	Yes	The mcen R package that implements the methods outlined in this article is available on CRAN (Sherwood and Price, 2018).
Experiment Setup	Yes	Tuning parameters for all methods are selected using 10-folds cross validation. For the MCEN and TMCEN methods cluster sizes of 2, 3 and 4 are considered. In the training data all variables are centered and scaled to have mean zero and a standard deviation of one. We ﬁlter the gene expression data for each response by using the top 25 genes in terms of absolute value of correlation with a given response.