Fast Stagewise Sparse Factor Regression

Authors: Kun Chen, Ruipeng Dong, Wanwan Xu, Zemin Zheng

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach. [...] In Section 4, extensive simulation studies demonstrate the effectiveness and scalability of our approach. An application in genetics is presented in Section 5.
Researcher Affiliation Academia Kun Chen EMAIL Department of Statistics University of Connecticut Storrs, CT, USA; Ruipeng Dong International Institute of Finance, The School of Management, University of Science and Technology of China Hefei, Anhui, China; Wanwan Xu Department of Biostatistics Yale University New Haven, CT, USA; Zemin Zheng EMAIL International Institute of Finance, The School of Management, University of Science and Technology of China Hefei, Anhui, China
Pseudocode Yes Algorithm 1 Contended Stagewise Learning for CURE (Pseudo code)
Open Source Code No We have implemented all the proposed computational methods in a user-friendly R package with RCpp (R Core Team, 2021). // The text mentions an R package implementation but does not explicitly state that the code for *this specific work* is publicly released or provide a link to a repository.
Open Datasets Yes Here, we analyze the yeast e QTL data set described by Brem and Kruglyak (2005) and Storey et al. (2005), to illustrate the power and scalability of the proposed approaches for estimating the associations between p = 3244 genetic markers and q = 54 genes that belong to the yeast Mitogen-activated protein kinases (MAPKs) signaling pathway (Kanehisa et al., 2009), with data collected from n = 112 yeast samples.
Dataset Splits Yes To make the comparison fair, all the methods have the same pre-specified rank, which is selected from RRR via 10-fold cross validation. Specifically, each time the data set is randomly split into 80% for model fitting and 20% for computing the out-sample mean squared error (MSE) of the fitted model.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions computation times without specifying the hardware environment.
Software Dependencies Yes We have implemented all the proposed computational methods in a user-friendly R package with RCpp (R Core Team, 2021).
Experiment Setup Yes We conduct simulation studies with data generated from the co-sparse factor regression model as specified in (1). We consider three simulation setups, which differ mainly on the generation of the true coefficient matrix C = U D V T Rp q, where U = [u 1, . . . , u r ], V = [v 1, . . . , v r ], and D = diag{d 1 . . . , d r }. Model I is a unit-rank model mainly for analyzing the properties of CURE, in which we set u 1 = u1/ u1 2 where u1 = [10, 10, 8, 8, 5, 5, rep(3, 5), rep( 3, 5), rep(0, p 16)]T , v 1 = v1/ v1 2 where v1 = [10, 9, 8, 7, 6, 5, 4, 3, rep(2, 17), rep(0, q 25)]T , and d 1 = 20, where rep(a, b) represent a 1 b vector with all entries equaling to a. [...] We set σ to control the signal-to-noise ratio (SNR), defined as SNR = d r Xu r v T r 2/ E F . Models II and III are considered here with n = q = 100, p {100, 200, 400}, r {3, 6}, SNR {0.25, 0.5, 1}, ρ = 0.3 and ϵ = 0.5 10 2. The experiment under each setting is repeated 200 times.