reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Block Domain Knowledge-Driven Learning of Chain Graphs Structure

Authors: Shujing Yang, Fuyuan Cao

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Meanwhile, we conduct theoretical analysis to prove the correctness of our algorithm and compare it with the LCD algorithm and MBLWF algorithm on synthetic and real-world datasets. The experimental results validate the eﬀectiveness of our algorithm. In Section 4, it reports experimental results to illustrate the performance of the KDLCG algorithm.
Researcher Affiliation	Academia	Shujing Yang EMAIL Fuyuan Cao (Corresponding author) EMAIL School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, Chian
Pseudocode	Yes	Algorithm 1: learn the Adj and SP (learn-AS) and Algorithm 2: learn LWF CG structure (KDLCG)
Open Source Code	No	The paper states: We implemented all algorithms in R by extending the code from the bnlearn (Javidian et al., 2020b), lcd (Ma et al., 2008), and pcalg (Kalisch et al., 2012) packages to LWF CGs. This indicates they used existing packages but does not provide a specific link or explicit statement that their own implementation of KDLCG is open-source or available.
Open Datasets	Yes	To evaluate the performance of the proposed algorithm, we perform extensive experiments to contrast our proposed KDLCG algorithm against the state-of-the-art LCD algorithm and Mb LWF algorithm. ... we verify the eﬀectiveness of all algorithms on time series datasets of insilico size10 1 and insilico size10 2 with feedback loops (n = 105, p = 10) in the DREAM4 Network Inference Challenge (Marbach, Schaﬀter, Mattiussi, & Floreano, 2009; Greenﬁeld, Madar, Ostrer, & Bonneau, 2010).
Dataset Splits	No	For synthetic data, the paper mentions: "training databases with the sample size of n = 500 or 5000 from this probability distribution" and "training databases with the sample size of n = 50 or 100 from this probability distribution." For real-world data, it mentions: "time series datasets of insilico size10 1 and insilico size10 2 with feedback loops (n = 105, p = 10)." These specify total sample sizes, but not explicit train/test/validation splits.
Hardware Specification	No	The paper mentions 'running time' as an evaluation metric, but does not provide any specific details about the hardware (CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	We implemented all algorithms in R by extending the code from the bnlearn (Javidian et al., 2020b), lcd (Ma et al., 2008), and pcalg (Kalisch et al., 2012) packages to LWF CGs. The specific version numbers for R or the listed packages are not provided.
Experiment Setup	Yes	For each sample, the signiﬁcance levels alpha of the LCD algorithm, the Mb LWF algorithm, and our KDLCG algorithm are respectively set at the values of 0.005 or 0.05 to perform the hypothesis. We generate a random chain graph on V as follows: (1) Order the p vertices and initialize a p p adjacency matrix A with zeros; (2) For each element in the lower triangle part of A, set it to be a random number generated from a Bernoulli distribution with probability of occurrence s = N/(p 1); ... (6) Set Aij = 0 for any (i, j) pair such that i Il, j Im with l > m. We consider random CGs with p ={10, 20, 40} and N ={2, 3}.