Estimating Causal Structure Using Conditional DAG Models

Authors: Chris. J. Oates, Jim Q. Smith, Sach Mukherjee

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate gains compared with formulations that treat all variables on an equal footing, or that ignore secondary variables. The methodology is motivated by applications in biology that involve multiple data types and is illustrated here using simulated data and in an analysis of molecular data from the Cancer Genome Atlas.
Researcher Affiliation Academia Chris. J. Oates EMAIL School of Mathematical and Physical Sciences University of Technology Sydney NSW 2007, Australia Jim Q. Smith EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK Sach Mukherjee EMAIL German Center for Neurodegenerative Diseases (DZNE) 53175 Bonn, Germany
Pseudocode No The paper describes the Integer Linear Programming (ILP) approach in detail with mathematical formulations, constraints, and propositions in Section 2.6, but it does not present a distinct, clearly labeled pseudocode or algorithm block.
Open Source Code No For the applications in this paper, all ILP instances were solved using the GOBNILP software that is freely available to download from http://www.cs.york.ac.uk/aig/sw/gobnilp/. This refers to a third-party software used, not open-source code for the specific methodology developed by the authors.
Open Datasets Yes The methodology is motivated by applications in biology that involve multiple data types and is illustrated here using simulated data and in an analysis of molecular data from the Cancer Genome Atlas. The data we analyse are from the TCGA pan-cancer project (Akbani et al., 2014)
Dataset Splits No For simulated data, the paper states: "we report the mean SHD as computed over 10 independent realisations of the data." For molecular data: "We focus on p = 24 proteins... The data span eight different cancer types... with a total sample size of n = 3,467 patients." Neither provides specific training/test/validation splits or their percentages, counts, or explicit methodology for data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions "all ILP instances were solved using the GOBNILP software" but does not specify a version number for this software.
Experiment Setup Yes We construct a linear model for the observations Yl j = [1 Xl j]β0 + Yl πβπ + ϵl j, ϵl j N(0, σ2) (...). For the parameter prior pj,π( βπ| β0, σ) we use the g-prior (Zellner, 1986) βπ| β0, j, π N(0, gσ2(M T π Mπ) 1) where g is a positive constant to be specified. (...). Let g = n. (...). For all estimators we considered only models of size |π| 5.