reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Finding Groups of Cross-Correlated Features in Bi-View Data

Authors: Miheer Dewaskar, John Palowitch, Mark He, Michael I. Love, Andrew B. Nobel

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to existing methods for detecting cross-correlated features, BSP was the best at recovering true bimodules with suﬃcient signal, while limiting the false discoveries. In addition, we applied BSP to the problem of expression quantitative trait loci (e QTL) analysis using data from the GTEx consortium. Keywords: cross-correlation network, iterative testing, permutation distribution, e QTL analysis, temperature and precipitation correlation. Section 3 is devoted to a simulation study that uses a complex model to capture some aspects of real bi-view data. Here, we evaluate the performance of BSP and compare it to that of CONDOR, s CCA, and Matrix EQTL. Section 4 describes and evaluates the results of BSP, CONDOR, and Matrix EQTL applied to an e QTL dataset from the GTEx consortium.
Researcher Affiliation	Collaboration	Miheer Dewaskar EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA John Palowitch EMAIL Google Research California, USA Mark He EMAIL Columbia University Mailman School of Public Health 722 West 168th St., NY 10032, USA Michael I. Love EMAIL Department of Biostatistics and Department of Genetics University of North Carolina at Chapel Hill Chapel Hill, NC 275997400, USA Andrew B. Nobel EMAIL Department of Statistics and Operations Research and Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill, NC 27599-3260, USA
Pseudocode	Yes	In practice, BSP is run repeatedly, initializing with every pair (A0, B0), where A0 = {s} is a single feature in S and B0 is the set of features t T that are signiﬁcantly correlated with s, or B0 = {t} is a single feature in T and A0 is the set of features s S that are signiﬁcantly correlated with t. Pseudocode for BSP is given in Algorithm 1. Algorithm 1: Bimodule Search Procedure (BSP)
Open Source Code	Yes	BSP R package: https://github.com/miheerdew/cbce
Open Datasets	Yes	In addition, we applied BSP to the problem of expression quantitative trait loci (e QTL) analysis using data from the GTEx consortium. Application of BSP to e QTL Analysis ... based on data from the Genotype Tissue Expression (GTEx) project. The Climatic Research Unit (CRU TS version 4.01) data (Harris et al., 2014) contains daily global measurements of temperature (T) and precipitation (P) levels on land over a .5o .5o (360 pixels by 720 pixels) resolution grid from 1901 to 2016.
Dataset Splits	Yes	To select the false discovery parameter α (0, 1) for BSP, we estimate the edge-error for each value of α from a pre-speciﬁed grid... we estimate the edge-error for BSP by running it on instances of the half-permuted dataset in which the sample labels for half of the features from each data type have been permuted. We generate a half-permuted dataset ( X, Y) as follows: 1. Randomly select half of the features, ˆS S and ˆT T, from each data type. In order to assess the propensity of each method to detect spurious bimodules, we applied BSP and CONDOR to ﬁve data sets obtained by jointly permuting the sample labels for the expression measurements and most covariates...
Hardware Specification	Yes	The various methods used is this analysis were run on a dedicated computer that had Intel (R) Xeon (R) E5-2640 CPU with 20 parallel cores at 2.50 Hz base frequency, and a 512 GB random access memory along with L1, L2 and L3 caches of sizes 1.3, 5 and 50 MB respectively.
Software Dependencies	Yes	The computer ran Windows server 2012 R2 operating system and we used the Microsoft R Open 3.5.3 software to perform most of our analysis, since it has multi-core implementations of linear algebra routines.
Experiment Setup	Yes	The parameter α = 0.02 for BSP was chosen to keep the edge-error estimates based on half-permuted data (see Section 2.8) under 0.05. The q-value cutoﬀfor Matrix EQTL was also taken to be 0.05. More details on how the various methods were run are provided in Appendix B. We applied BSP to the thyroid e QTL data with false discovery parameter α = 0.03, selected using a permutation-based procedure to keep the edge-error estimates under 0.05. More precisely, for various penalty parameters λ [0, 1], we ran s CCA (Witten et al., 2020) to ﬁnd 100 canonical covariate pairs with the ℓ1 norm constraint of λ p and λ q on the coeﬃcients of the linear combinations corresponding to S and T respectively. Initially, we considered λ = 0.1, chosen by the permutation based procedure provided with the software. However, the resulting bimodules were very large and had high edge-error (further details are provided in Section B.2.2). Based on a rough grid search, we then ran the procedure with each value λ {.01, .02, .03, .04} to obtain smaller bimodules.