Finding Groups of Cross-Correlated Features in Bi-View Data

Authors: Miheer Dewaskar, John Palowitch, Mark He, Michael I. Love, Andrew B. Nobel

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared to existing methods for detecting cross-correlated features, BSP was the best at recovering true bimodules with sufficient signal, while limiting the false discoveries. In addition, we applied BSP to the problem of expression quantitative trait loci (e QTL) analysis using data from the GTEx consortium. Keywords: cross-correlation network, iterative testing, permutation distribution, e QTL analysis, temperature and precipitation correlation. Section 3 is devoted to a simulation study that uses a complex model to capture some aspects of real bi-view data. Here, we evaluate the performance of BSP and compare it to that of CONDOR, s CCA, and Matrix EQTL. Section 4 describes and evaluates the results of BSP, CONDOR, and Matrix EQTL applied to an e QTL dataset from the GTEx consortium.
Researcher Affiliation Collaboration Miheer Dewaskar EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA John Palowitch EMAIL Google Research California, USA Mark He EMAIL Columbia University Mailman School of Public Health 722 West 168th St., NY 10032, USA Michael I. Love EMAIL Department of Biostatistics and Department of Genetics University of North Carolina at Chapel Hill Chapel Hill, NC 275997400, USA Andrew B. Nobel EMAIL Department of Statistics and Operations Research and Department of Biostatistics University of North Carolina at Chapel Hill Chapel Hill, NC 27599-3260, USA
Pseudocode Yes In practice, BSP is run repeatedly, initializing with every pair (A0, B0), where A0 = {s} is a single feature in S and B0 is the set of features t T that are significantly correlated with s, or B0 = {t} is a single feature in T and A0 is the set of features s S that are significantly correlated with t. Pseudocode for BSP is given in Algorithm 1. Algorithm 1: Bimodule Search Procedure (BSP)
Open Source Code Yes BSP R package: https://github.com/miheerdew/cbce
Open Datasets Yes In addition, we applied BSP to the problem of expression quantitative trait loci (e QTL) analysis using data from the GTEx consortium. Application of BSP to e QTL Analysis ... based on data from the Genotype Tissue Expression (GTEx) project. The Climatic Research Unit (CRU TS version 4.01) data (Harris et al., 2014) contains daily global measurements of temperature (T) and precipitation (P) levels on land over a .5o .5o (360 pixels by 720 pixels) resolution grid from 1901 to 2016.
Dataset Splits Yes To select the false discovery parameter α (0, 1) for BSP, we estimate the edge-error for each value of α from a pre-specified grid... we estimate the edge-error for BSP by running it on instances of the half-permuted dataset in which the sample labels for half of the features from each data type have been permuted. We generate a half-permuted dataset ( X, Y) as follows: 1. Randomly select half of the features, ˆS S and ˆT T, from each data type. In order to assess the propensity of each method to detect spurious bimodules, we applied BSP and CONDOR to five data sets obtained by jointly permuting the sample labels for the expression measurements and most covariates...
Hardware Specification Yes The various methods used is this analysis were run on a dedicated computer that had Intel (R) Xeon (R) E5-2640 CPU with 20 parallel cores at 2.50 Hz base frequency, and a 512 GB random access memory along with L1, L2 and L3 caches of sizes 1.3, 5 and 50 MB respectively.
Software Dependencies Yes The computer ran Windows server 2012 R2 operating system and we used the Microsoft R Open 3.5.3 software to perform most of our analysis, since it has multi-core implementations of linear algebra routines.
Experiment Setup Yes The parameter α = 0.02 for BSP was chosen to keep the edge-error estimates based on half-permuted data (see Section 2.8) under 0.05. The q-value cutofffor Matrix EQTL was also taken to be 0.05. More details on how the various methods were run are provided in Appendix B. We applied BSP to the thyroid e QTL data with false discovery parameter α = 0.03, selected using a permutation-based procedure to keep the edge-error estimates under 0.05. More precisely, for various penalty parameters λ [0, 1], we ran s CCA (Witten et al., 2020) to find 100 canonical covariate pairs with the ℓ1 norm constraint of λ p and λ q on the coefficients of the linear combinations corresponding to S and T respectively. Initially, we considered λ = 0.1, chosen by the permutation based procedure provided with the software. However, the resulting bimodules were very large and had high edge-error (further details are provided in Section B.2.2). Based on a rough grid search, we then ran the procedure with each value λ {.01, .02, .03, .04} to obtain smaller bimodules.