Fast Stagewise Sparse Factor Regression
Authors: Kun Chen, Ruipeng Dong, Wanwan Xu, Zemin Zheng
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach. [...] In Section 4, extensive simulation studies demonstrate the effectiveness and scalability of our approach. An application in genetics is presented in Section 5. |
| Researcher Affiliation | Academia | Kun Chen EMAIL Department of Statistics University of Connecticut Storrs, CT, USA; Ruipeng Dong International Institute of Finance, The School of Management, University of Science and Technology of China Hefei, Anhui, China; Wanwan Xu Department of Biostatistics Yale University New Haven, CT, USA; Zemin Zheng EMAIL International Institute of Finance, The School of Management, University of Science and Technology of China Hefei, Anhui, China |
| Pseudocode | Yes | Algorithm 1 Contended Stagewise Learning for CURE (Pseudo code) |
| Open Source Code | No | We have implemented all the proposed computational methods in a user-friendly R package with RCpp (R Core Team, 2021). // The text mentions an R package implementation but does not explicitly state that the code for *this specific work* is publicly released or provide a link to a repository. |
| Open Datasets | Yes | Here, we analyze the yeast e QTL data set described by Brem and Kruglyak (2005) and Storey et al. (2005), to illustrate the power and scalability of the proposed approaches for estimating the associations between p = 3244 genetic markers and q = 54 genes that belong to the yeast Mitogen-activated protein kinases (MAPKs) signaling pathway (Kanehisa et al., 2009), with data collected from n = 112 yeast samples. |
| Dataset Splits | Yes | To make the comparison fair, all the methods have the same pre-specified rank, which is selected from RRR via 10-fold cross validation. Specifically, each time the data set is randomly split into 80% for model fitting and 20% for computing the out-sample mean squared error (MSE) of the fitted model. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions computation times without specifying the hardware environment. |
| Software Dependencies | Yes | We have implemented all the proposed computational methods in a user-friendly R package with RCpp (R Core Team, 2021). |
| Experiment Setup | Yes | We conduct simulation studies with data generated from the co-sparse factor regression model as specified in (1). We consider three simulation setups, which differ mainly on the generation of the true coefficient matrix C = U D V T Rp q, where U = [u 1, . . . , u r ], V = [v 1, . . . , v r ], and D = diag{d 1 . . . , d r }. Model I is a unit-rank model mainly for analyzing the properties of CURE, in which we set u 1 = u1/ u1 2 where u1 = [10, 10, 8, 8, 5, 5, rep(3, 5), rep( 3, 5), rep(0, p 16)]T , v 1 = v1/ v1 2 where v1 = [10, 9, 8, 7, 6, 5, 4, 3, rep(2, 17), rep(0, q 25)]T , and d 1 = 20, where rep(a, b) represent a 1 b vector with all entries equaling to a. [...] We set σ to control the signal-to-noise ratio (SNR), defined as SNR = d r Xu r v T r 2/ E F . Models II and III are considered here with n = q = 100, p {100, 200, 400}, r {3, 6}, SNR {0.25, 0.5, 1}, ρ = 0.3 and ϵ = 0.5 10 2. The experiment under each setting is repeated 200 times. |