Structure Learning of Undirected Graphical Models for Count Data

Authors: Nguyen Thi Kim Hue, Monica Chiogna

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the performance of PC-LPGM in recovering the true structure of the graphs in situations where relatively moderate sample sizes are available, extensive simulation studies are conducted, that also allow to compare our proposal with its main competitors. A biological validation of the algorithm is presented through the analysis of two real data sets.
Researcher Affiliation Academia Nguyen Thi Kim Hue EMAIL Department of Statistical Sciences University of Padova Via C. Battisti, 241 35121 Padova, Italy Monica Chiogna EMAIL Department of Statistical Sciences Paolo Fortunati University of Bologna Via Belle Arti 41 40126 Bologna, Italy
Pseudocode Yes The pseudo-code of our algorithm is illustrated in Algorithm 1, where adj( ˆG, s) = {t V : (s, t) ˆG} denotes the estimated set of all nodes that are adjacent to s on the graph ˆG.
Open Source Code No The paper does not explicitly provide a specific link or statement for the open-source code of the methodology described (PC-LPGM). While it mentions that competitor algorithms are implemented in R packages (e.g., 'XMRF', 'huge') or available via specific links (e.g., 'learn PDN (see https://sfb876.tu-dortmund.de/auto?self=%24eon9ai8e80)'), it does not do so for PC-LPGM itself.
Open Datasets Yes mi RNAs expression, obtained by high-throughput sequencing, was downloaded from The Cancer Genome Atlas (TCGA) portal (https://tcga-data.nci.nih.gov/docs/publications/brca_2012/). The raw count data set consisted of 544 patients and 1046 mi RNAs. ... Gene expression, obtained by high-throughput sequencing, was downloaded from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99251). The raw count data set consisted of 542 cells and we selected 850 transcription factor genes (the list of transcription factors was downloaded from https://github.com/diyadas/HBC-regen/tree/master/ref).
Dataset Splits No The paper conducts simulation studies where data is generated based on predefined graph structures and sample sizes, and evaluates the algorithm's ability to recover the true structure. For real data, it describes preprocessing steps and uses the entire preprocessed dataset for network inference. However, it does not specify train/test/validation splits for model evaluation in the conventional sense.
Hardware Specification Yes The runtime analysis (second) was done on an CPU: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz on Linux and using R 3.5.1 and 20 cores.
Software Dependencies Yes The runtime analysis (second) was done on an CPU: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz on Linux and using R 3.5.1 and 20 cores.
Experiment Setup Yes PC-LPGM: level of significance of tests 1%; m = 8 for p = 10; m = 3 for p = 100; LPGM: β = 0.05, nlambda = 10, B = 20; λmin/λmax = 0.01; γ = 0.001, sth = 0.9; VSL: β = 0.1, nlambda = 10, B = 20; GLASSO: β = 0.1, nlambda = 10, B = 20; NPN-copula: β = 0.1, nlambda = 10, B = 20; NPN-skeptic: β = 0.1, nlambda = 10, B = 20.