Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets

Authors: Sofia Triantafillou, Ioannis Tsamardinos

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co-analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions.
Researcher Affiliation Academia Sofia Triantafillou EMAIL Ioannis Tsamardinos EMAIL Institute of Computer Science Foundation for Research and Technology Hellas (FORTH) N. Plastira 100 Vassilika Vouton GR-700 13 Heraklion, Crete, Greece. Also in Department of Computer Science, University of Crete.
Pseudocode Yes Algorithm 1: SMCMto MAG. Algorithm 2: COmbINE. Algorithm 3: initialize SMCM. Algorithm 4: add Constraints. Algorithm 5: MPRstrategy.
Open Source Code No The paper mentions using MINISAT2.0, BDAGL, and Cytoscape, which are third-party tools or packages. It also mentions an 'available implementation of SBCSD by its authors'. However, there is no explicit statement or link indicating that the authors have released the source code for their COmbINE algorithm.
Open Datasets Yes Finally, we present a proof-of-concept computational experiment by applying the algorithm on 5 heterogeneous data sets from Bendall et al. (2011) and Bodenmiller et al. (2012) measuring overlapping variable sets under 3 different manipulations. The data sets measure protein concentrations in thousands of human cells of the autoimmune system using mass-cytometry technologies.
Dataset Splits No The paper mentions simulating data from randomly generated networks for experimental evaluation and discretizing real mass cytometry data into 4 bins. However, it does not provide specific details on how these datasets were split into training, validation, or test sets for reproduction of experiments. Standard dataset splits are not mentioned.
Hardware Specification No Both algorithms [COmbINE and SBCSD] were run on the same computer, with 4GB of available memory.
Software Dependencies Yes SAT instances were solved using MINISAT2.0 (E en and S orensson, 2004) along with the modifications presented in Hyttinen et al. (2013) for iterative solving and computing the backbone with some minor modifications for sequentially performing literal queries. We use the BGE metric for Gaussian distributions (Geiger and Heckerman, 1994) as implemented in the BDAGL package Eaton and Murphy (2007a). We used Cytoscape (Smoot et al., 2011) to visualize the summary graphs produced by COmbINE.
Experiment Setup Yes COmbINE default parameters were set as follows: maximum path length = 3, α = 0.1 and maximum conditioning set max K = 5, and the Fisher z-test of conditional independence. As far as orientations are concerned, in our experience, FCI is very prone to error propagation, we therefore used the rule in (Ramsey et al., 2006) for conservative colliders. ... For all network sizes, learning performance monotonically decreases with increased density, while the percentage of dashed features does not significantly vary. The size of the network has a smaller impact on the performance, particularly for the sparser networks.