Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets
Authors: Sofia Triantafillou, Ioannis Tsamardinos
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co-analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions. |
| Researcher Affiliation | Academia | Sofia Triantafillou EMAIL Ioannis Tsamardinos EMAIL Institute of Computer Science Foundation for Research and Technology Hellas (FORTH) N. Plastira 100 Vassilika Vouton GR-700 13 Heraklion, Crete, Greece. Also in Department of Computer Science, University of Crete. |
| Pseudocode | Yes | Algorithm 1: SMCMto MAG. Algorithm 2: COmbINE. Algorithm 3: initialize SMCM. Algorithm 4: add Constraints. Algorithm 5: MPRstrategy. |
| Open Source Code | No | The paper mentions using MINISAT2.0, BDAGL, and Cytoscape, which are third-party tools or packages. It also mentions an 'available implementation of SBCSD by its authors'. However, there is no explicit statement or link indicating that the authors have released the source code for their COmbINE algorithm. |
| Open Datasets | Yes | Finally, we present a proof-of-concept computational experiment by applying the algorithm on 5 heterogeneous data sets from Bendall et al. (2011) and Bodenmiller et al. (2012) measuring overlapping variable sets under 3 different manipulations. The data sets measure protein concentrations in thousands of human cells of the autoimmune system using mass-cytometry technologies. |
| Dataset Splits | No | The paper mentions simulating data from randomly generated networks for experimental evaluation and discretizing real mass cytometry data into 4 bins. However, it does not provide specific details on how these datasets were split into training, validation, or test sets for reproduction of experiments. Standard dataset splits are not mentioned. |
| Hardware Specification | No | Both algorithms [COmbINE and SBCSD] were run on the same computer, with 4GB of available memory. |
| Software Dependencies | Yes | SAT instances were solved using MINISAT2.0 (E en and S orensson, 2004) along with the modifications presented in Hyttinen et al. (2013) for iterative solving and computing the backbone with some minor modifications for sequentially performing literal queries. We use the BGE metric for Gaussian distributions (Geiger and Heckerman, 1994) as implemented in the BDAGL package Eaton and Murphy (2007a). We used Cytoscape (Smoot et al., 2011) to visualize the summary graphs produced by COmbINE. |
| Experiment Setup | Yes | COmbINE default parameters were set as follows: maximum path length = 3, α = 0.1 and maximum conditioning set max K = 5, and the Fisher z-test of conditional independence. As far as orientations are concerned, in our experience, FCI is very prone to error propagation, we therefore used the rule in (Ramsey et al., 2006) for conservative colliders. ... For all network sizes, learning performance monotonically decreases with increased density, while the percentage of dashed features does not significantly vary. The size of the network has a smaller impact on the performance, particularly for the sparser networks. |