Structure Learning of Undirected Graphical Models for Count Data
Authors: Nguyen Thi Kim Hue, Monica Chiogna
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the performance of PC-LPGM in recovering the true structure of the graphs in situations where relatively moderate sample sizes are available, extensive simulation studies are conducted, that also allow to compare our proposal with its main competitors. A biological validation of the algorithm is presented through the analysis of two real data sets. |
| Researcher Affiliation | Academia | Nguyen Thi Kim Hue EMAIL Department of Statistical Sciences University of Padova Via C. Battisti, 241 35121 Padova, Italy Monica Chiogna EMAIL Department of Statistical Sciences Paolo Fortunati University of Bologna Via Belle Arti 41 40126 Bologna, Italy |
| Pseudocode | Yes | The pseudo-code of our algorithm is illustrated in Algorithm 1, where adj( ˆG, s) = {t V : (s, t) ˆG} denotes the estimated set of all nodes that are adjacent to s on the graph ˆG. |
| Open Source Code | No | The paper does not explicitly provide a specific link or statement for the open-source code of the methodology described (PC-LPGM). While it mentions that competitor algorithms are implemented in R packages (e.g., 'XMRF', 'huge') or available via specific links (e.g., 'learn PDN (see https://sfb876.tu-dortmund.de/auto?self=%24eon9ai8e80)'), it does not do so for PC-LPGM itself. |
| Open Datasets | Yes | mi RNAs expression, obtained by high-throughput sequencing, was downloaded from The Cancer Genome Atlas (TCGA) portal (https://tcga-data.nci.nih.gov/docs/publications/brca_2012/). The raw count data set consisted of 544 patients and 1046 mi RNAs. ... Gene expression, obtained by high-throughput sequencing, was downloaded from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99251). The raw count data set consisted of 542 cells and we selected 850 transcription factor genes (the list of transcription factors was downloaded from https://github.com/diyadas/HBC-regen/tree/master/ref). |
| Dataset Splits | No | The paper conducts simulation studies where data is generated based on predefined graph structures and sample sizes, and evaluates the algorithm's ability to recover the true structure. For real data, it describes preprocessing steps and uses the entire preprocessed dataset for network inference. However, it does not specify train/test/validation splits for model evaluation in the conventional sense. |
| Hardware Specification | Yes | The runtime analysis (second) was done on an CPU: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz on Linux and using R 3.5.1 and 20 cores. |
| Software Dependencies | Yes | The runtime analysis (second) was done on an CPU: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz on Linux and using R 3.5.1 and 20 cores. |
| Experiment Setup | Yes | PC-LPGM: level of significance of tests 1%; m = 8 for p = 10; m = 3 for p = 100; LPGM: β = 0.05, nlambda = 10, B = 20; λmin/λmax = 0.01; γ = 0.001, sth = 0.9; VSL: β = 0.1, nlambda = 10, B = 20; GLASSO: β = 0.1, nlambda = 10, B = 20; NPN-copula: β = 0.1, nlambda = 10, B = 20; NPN-skeptic: β = 0.1, nlambda = 10, B = 20. |