Estimating Causal Structure Using Conditional DAG Models
Authors: Chris. J. Oates, Jim Q. Smith, Sach Mukherjee
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate gains compared with formulations that treat all variables on an equal footing, or that ignore secondary variables. The methodology is motivated by applications in biology that involve multiple data types and is illustrated here using simulated data and in an analysis of molecular data from the Cancer Genome Atlas. |
| Researcher Affiliation | Academia | Chris. J. Oates EMAIL School of Mathematical and Physical Sciences University of Technology Sydney NSW 2007, Australia Jim Q. Smith EMAIL Department of Statistics University of Warwick Coventry, CV4 7AL, UK Sach Mukherjee EMAIL German Center for Neurodegenerative Diseases (DZNE) 53175 Bonn, Germany |
| Pseudocode | No | The paper describes the Integer Linear Programming (ILP) approach in detail with mathematical formulations, constraints, and propositions in Section 2.6, but it does not present a distinct, clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | For the applications in this paper, all ILP instances were solved using the GOBNILP software that is freely available to download from http://www.cs.york.ac.uk/aig/sw/gobnilp/. This refers to a third-party software used, not open-source code for the specific methodology developed by the authors. |
| Open Datasets | Yes | The methodology is motivated by applications in biology that involve multiple data types and is illustrated here using simulated data and in an analysis of molecular data from the Cancer Genome Atlas. The data we analyse are from the TCGA pan-cancer project (Akbani et al., 2014) |
| Dataset Splits | No | For simulated data, the paper states: "we report the mean SHD as computed over 10 independent realisations of the data." For molecular data: "We focus on p = 24 proteins... The data span eight different cancer types... with a total sample size of n = 3,467 patients." Neither provides specific training/test/validation splits or their percentages, counts, or explicit methodology for data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions "all ILP instances were solved using the GOBNILP software" but does not specify a version number for this software. |
| Experiment Setup | Yes | We construct a linear model for the observations Yl j = [1 Xl j]β0 + Yl πβπ + ϵl j, ϵl j N(0, σ2) (...). For the parameter prior pj,π( βπ| β0, σ) we use the g-prior (Zellner, 1986) βπ| β0, j, π N(0, gσ2(M T π Mπ) 1) where g is a positive constant to be specified. (...). Let g = n. (...). For all estimators we considered only models of size |π| 5. |