reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Structure Learning of Undirected Graphical Models for Count Data

Authors: Nguyen Thi Kim Hue, Monica Chiogna

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the performance of PC-LPGM in recovering the true structure of the graphs in situations where relatively moderate sample sizes are available, extensive simulation studies are conducted, that also allow to compare our proposal with its main competitors. A biological validation of the algorithm is presented through the analysis of two real data sets.
Researcher Affiliation	Academia	Nguyen Thi Kim Hue EMAIL Department of Statistical Sciences University of Padova Via C. Battisti, 241 35121 Padova, Italy Monica Chiogna EMAIL Department of Statistical Sciences Paolo Fortunati University of Bologna Via Belle Arti 41 40126 Bologna, Italy
Pseudocode	Yes	The pseudo-code of our algorithm is illustrated in Algorithm 1, where adj( ˆG, s) = {t V : (s, t) ˆG} denotes the estimated set of all nodes that are adjacent to s on the graph ˆG.
Open Source Code	No	The paper does not explicitly provide a specific link or statement for the open-source code of the methodology described (PC-LPGM). While it mentions that competitor algorithms are implemented in R packages (e.g., 'XMRF', 'huge') or available via specific links (e.g., 'learn PDN (see https://sfb876.tu-dortmund.de/auto?self=%24eon9ai8e80)'), it does not do so for PC-LPGM itself.
Open Datasets	Yes	mi RNAs expression, obtained by high-throughput sequencing, was downloaded from The Cancer Genome Atlas (TCGA) portal (https://tcga-data.nci.nih.gov/docs/publications/brca_2012/). The raw count data set consisted of 544 patients and 1046 mi RNAs. ... Gene expression, obtained by high-throughput sequencing, was downloaded from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99251). The raw count data set consisted of 542 cells and we selected 850 transcription factor genes (the list of transcription factors was downloaded from https://github.com/diyadas/HBC-regen/tree/master/ref).
Dataset Splits	No	The paper conducts simulation studies where data is generated based on predefined graph structures and sample sizes, and evaluates the algorithm's ability to recover the true structure. For real data, it describes preprocessing steps and uses the entire preprocessed dataset for network inference. However, it does not specify train/test/validation splits for model evaluation in the conventional sense.
Hardware Specification	Yes	The runtime analysis (second) was done on an CPU: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz on Linux and using R 3.5.1 and 20 cores.
Software Dependencies	Yes	The runtime analysis (second) was done on an CPU: Intel(R) Xeon(R) CPU E5-4650 v3 @ 2.10GHz on Linux and using R 3.5.1 and 20 cores.
Experiment Setup	Yes	PC-LPGM: level of signiﬁcance of tests 1%; m = 8 for p = 10; m = 3 for p = 100; LPGM: β = 0.05, nlambda = 10, B = 20; λmin/λmax = 0.01; γ = 0.001, sth = 0.9; VSL: β = 0.1, nlambda = 10, B = 20; GLASSO: β = 0.1, nlambda = 10, B = 20; NPN-copula: β = 0.1, nlambda = 10, B = 20; NPN-skeptic: β = 0.1, nlambda = 10, B = 20.