reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets

Authors: Sofia Triantafillou, Ioannis Tsamardinos

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co-analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions.
Researcher Affiliation	Academia	Soﬁa Triantaﬁllou EMAIL Ioannis Tsamardinos EMAIL Institute of Computer Science Foundation for Research and Technology Hellas (FORTH) N. Plastira 100 Vassilika Vouton GR-700 13 Heraklion, Crete, Greece. Also in Department of Computer Science, University of Crete.
Pseudocode	Yes	Algorithm 1: SMCMto MAG. Algorithm 2: COmbINE. Algorithm 3: initialize SMCM. Algorithm 4: add Constraints. Algorithm 5: MPRstrategy.
Open Source Code	No	The paper mentions using MINISAT2.0, BDAGL, and Cytoscape, which are third-party tools or packages. It also mentions an 'available implementation of SBCSD by its authors'. However, there is no explicit statement or link indicating that the authors have released the source code for their COmbINE algorithm.
Open Datasets	Yes	Finally, we present a proof-of-concept computational experiment by applying the algorithm on 5 heterogeneous data sets from Bendall et al. (2011) and Bodenmiller et al. (2012) measuring overlapping variable sets under 3 diﬀerent manipulations. The data sets measure protein concentrations in thousands of human cells of the autoimmune system using mass-cytometry technologies.
Dataset Splits	No	The paper mentions simulating data from randomly generated networks for experimental evaluation and discretizing real mass cytometry data into 4 bins. However, it does not provide specific details on how these datasets were split into training, validation, or test sets for reproduction of experiments. Standard dataset splits are not mentioned.
Hardware Specification	No	Both algorithms [COmbINE and SBCSD] were run on the same computer, with 4GB of available memory.
Software Dependencies	Yes	SAT instances were solved using MINISAT2.0 (E en and S orensson, 2004) along with the modiﬁcations presented in Hyttinen et al. (2013) for iterative solving and computing the backbone with some minor modiﬁcations for sequentially performing literal queries. We use the BGE metric for Gaussian distributions (Geiger and Heckerman, 1994) as implemented in the BDAGL package Eaton and Murphy (2007a). We used Cytoscape (Smoot et al., 2011) to visualize the summary graphs produced by COmbINE.
Experiment Setup	Yes	COmbINE default parameters were set as follows: maximum path length = 3, α = 0.1 and maximum conditioning set max K = 5, and the Fisher z-test of conditional independence. As far as orientations are concerned, in our experience, FCI is very prone to error propagation, we therefore used the rule in (Ramsey et al., 2006) for conservative colliders. ... For all network sizes, learning performance monotonically decreases with increased density, while the percentage of dashed features does not signiﬁcantly vary. The size of the network has a smaller impact on the performance, particularly for the sparser networks.