reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Equivalence Classes of Bayesian Network Structures with GFlowNet

Authors: Michelle Liu, Zhaocheng Zhu, Olexa Bilaniuk, Emmanuel Bengio

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both simulated and real-world datasets demonstrate that CPDAG-GFN performs competitively with established methods for learning CPDAG candidates from observational data.
Researcher Affiliation	Collaboration	Michelle Liu EMAIL Mila Québec AI Institute Zhaocheng Zhu EMAIL Mila Québec AI Institute Olexa Bilaniuk EMAIL Mila Québec AI Institute Emmanuel Bengio EMAIL Valence Labs
Pseudocode	No	The paper describes the method and its components in sections like 2.2 GFlow Net, 3.1 Heuristic Edge-Sparsity Filter, and 3.2 GFlow Net setup, but does not present a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper mentions 'The DAG-GFN was run using the publicly available code from Deleu et al. (2022).' which refers to a baseline, but does not provide concrete access to the source code for CPDAG-GFN, nor does it contain a direct statement of code release for the presented methodology.
Open Datasets	Yes	In this section, we empirically assess CPDAG-GFN s performance against established methods by comparing the learned CPDAGs to a real-world proteomics dataset obtained from protein signaling networks (Sachs et al., 2005).
Dataset Splits	No	The paper mentions dataset sizes such as 'N = 854 continuous observations' for the Sachs dataset and 'observational data sizes ranging from small (100) to large (a million)' for synthetic data, but it does not provide specific details on how these datasets are partitioned into training, validation, or test splits for experimental reproduction.
Hardware Specification	Yes	All experiments were run on an Apple M4 Pro CPU (12 cores) with 24GB unified memory.
Software Dependencies	No	The paper mentions the use of relational graph neural networks (RGCN) and Simpl E score function, but it does not specify any software libraries or packages with their corresponding version numbers used for the implementation.
Experiment Setup	Yes	In addition, we designed our experiments to include several scenarios: different observational data sizes ranging from small (100) to large (a million) to demonstrate scalability, varying levels of noise in the data from small (0.01) to moderate (0.1), and different network complexities with expected degrees of 1d, 2d, and 3d, where d is the number of nodes, in which we set to d=10 in our experiments. For each experimental run, we sample K graphs, with K set to 100.