Causal Inference through a Witness Protection Program

Authors: Ricardo Silva, Robin Evans

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 9 contains experiments with synthetic and real data. In Section 9, we provide evidence for this claim. We simulate 100 data sets for each one of the four cases (hard case/easy case, with theoretical solution/without theoretical solution), 5000 points per data set, 1000 Monte Carlo samples per decision.
Researcher Affiliation Academia Ricardo Silva EMAIL Department of Statistical Science and CSML University College London London WC1E 6BT, UK Robin Evans EMAIL Department of Statistics University of Oxford Oxford OX1 3TG, UK
Pseudocode Yes Algorithm 1: A simplified Witness Protection Program algorithm, assuming the observable distribution P(W, X, Y ) is known. Algorithm 2: The outline of the Witness Protection Program algorithm. Algorithm 3: The iterative back-substitution procedure for bounding Lxw ωxw Uxw for all combinations of x and w in {0, 1}2.
Open Source Code Yes Ongoing updates of software for WPP is provided as part of the R package Causal FX, available at the Comprehensive R Network27 and Git Hub28. A snapshot of the code used in this paper is available at http://www.homepages.ucl.ac.uk/~ucgtrbd/wpp.
Open Datasets Yes Our empirical study concerns the effect of influenza vaccination on a patient being later on hospitalized with chest problems. ... The study was originally discussed by Mc Donald et al. (1992). ... We performed an empirical study with the 1976 Panel Study of Income Dynamics. ... The data was discussed by Mroz (1987) and can be obtained from the R package AER (Kleiber and Zeileis, 2008).
Dataset Splits No The paper mentions overall sample sizes for datasets: '5000 points per data set' for synthetic studies, '2, 681 patients' for the influenza study, and 'sample size is 753' for the income study. It also mentions '1000 Monte Carlo samples per decision' as part of the experimental setup. However, it does not provide explicit training, testing, or validation splits for these datasets.
Hardware Specification Yes Experiments were run on an Intel Xeon E5-1650 at 3.20Ghz.
Software Dependencies No The paper mentions several R packages used, such as 'rcdd', 'huge', 'sbgcop', and 'AER', and that 'All code was written in R'. However, it does not specify concrete version numbers for any of these software components or the R environment itself.
Experiment Setup Yes In the first batch, we set ϵx = ϵy = ϵw = 0.2, and β = 0.9, β = 1.1. In the second batch, we change parameters so that β = β = 1. We simulate 100 data sets for each one of the four cases (hard case/easy case, with theoretical solution/without theoretical solution), 5000 points per data set, 1000 Monte Carlo samples per decision.