reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Stagewise Sparse Factor Regression

Authors: Kun Chen, Ruipeng Dong, Wanwan Xu, Zemin Zheng

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulation studies and an application in genetics demonstrate the eﬀectiveness and scalability of our approach. [...] In Section 4, extensive simulation studies demonstrate the eﬀectiveness and scalability of our approach. An application in genetics is presented in Section 5.
Researcher Affiliation	Academia	Kun Chen EMAIL Department of Statistics University of Connecticut Storrs, CT, USA; Ruipeng Dong International Institute of Finance, The School of Management, University of Science and Technology of China Hefei, Anhui, China; Wanwan Xu Department of Biostatistics Yale University New Haven, CT, USA; Zemin Zheng EMAIL International Institute of Finance, The School of Management, University of Science and Technology of China Hefei, Anhui, China
Pseudocode	Yes	Algorithm 1 Contended Stagewise Learning for CURE (Pseudo code)
Open Source Code	No	We have implemented all the proposed computational methods in a user-friendly R package with RCpp (R Core Team, 2021). // The text mentions an R package implementation but does not explicitly state that the code for this specific work is publicly released or provide a link to a repository.
Open Datasets	Yes	Here, we analyze the yeast e QTL data set described by Brem and Kruglyak (2005) and Storey et al. (2005), to illustrate the power and scalability of the proposed approaches for estimating the associations between p = 3244 genetic markers and q = 54 genes that belong to the yeast Mitogen-activated protein kinases (MAPKs) signaling pathway (Kanehisa et al., 2009), with data collected from n = 112 yeast samples.
Dataset Splits	Yes	To make the comparison fair, all the methods have the same pre-speciﬁed rank, which is selected from RRR via 10-fold cross validation. Speciﬁcally, each time the data set is randomly split into 80% for model ﬁtting and 20% for computing the out-sample mean squared error (MSE) of the ﬁtted model.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions computation times without specifying the hardware environment.
Software Dependencies	Yes	We have implemented all the proposed computational methods in a user-friendly R package with RCpp (R Core Team, 2021).
Experiment Setup	Yes	We conduct simulation studies with data generated from the co-sparse factor regression model as speciﬁed in (1). We consider three simulation setups, which diﬀer mainly on the generation of the true coeﬃcient matrix C = U D V T Rp q, where U = [u 1, . . . , u r ], V = [v 1, . . . , v r ], and D = diag{d 1 . . . , d r }. Model I is a unit-rank model mainly for analyzing the properties of CURE, in which we set u 1 = u1/ u1 2 where u1 = [10, 10, 8, 8, 5, 5, rep(3, 5), rep( 3, 5), rep(0, p 16)]T , v 1 = v1/ v1 2 where v1 = [10, 9, 8, 7, 6, 5, 4, 3, rep(2, 17), rep(0, q 25)]T , and d 1 = 20, where rep(a, b) represent a 1 b vector with all entries equaling to a. [...] We set σ to control the signal-to-noise ratio (SNR), deﬁned as SNR = d r Xu r v T r 2/ E F . Models II and III are considered here with n = q = 100, p {100, 200, 400}, r {3, 6}, SNR {0.25, 0.5, 1}, ρ = 0.3 and ϵ = 0.5 10 2. The experiment under each setting is repeated 200 times.