Causal Inference through a Witness Protection Program
Authors: Ricardo Silva, Robin Evans
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 9 contains experiments with synthetic and real data. In Section 9, we provide evidence for this claim. We simulate 100 data sets for each one of the four cases (hard case/easy case, with theoretical solution/without theoretical solution), 5000 points per data set, 1000 Monte Carlo samples per decision. |
| Researcher Affiliation | Academia | Ricardo Silva EMAIL Department of Statistical Science and CSML University College London London WC1E 6BT, UK Robin Evans EMAIL Department of Statistics University of Oxford Oxford OX1 3TG, UK |
| Pseudocode | Yes | Algorithm 1: A simplified Witness Protection Program algorithm, assuming the observable distribution P(W, X, Y ) is known. Algorithm 2: The outline of the Witness Protection Program algorithm. Algorithm 3: The iterative back-substitution procedure for bounding Lxw ωxw Uxw for all combinations of x and w in {0, 1}2. |
| Open Source Code | Yes | Ongoing updates of software for WPP is provided as part of the R package Causal FX, available at the Comprehensive R Network27 and Git Hub28. A snapshot of the code used in this paper is available at http://www.homepages.ucl.ac.uk/~ucgtrbd/wpp. |
| Open Datasets | Yes | Our empirical study concerns the effect of influenza vaccination on a patient being later on hospitalized with chest problems. ... The study was originally discussed by Mc Donald et al. (1992). ... We performed an empirical study with the 1976 Panel Study of Income Dynamics. ... The data was discussed by Mroz (1987) and can be obtained from the R package AER (Kleiber and Zeileis, 2008). |
| Dataset Splits | No | The paper mentions overall sample sizes for datasets: '5000 points per data set' for synthetic studies, '2, 681 patients' for the influenza study, and 'sample size is 753' for the income study. It also mentions '1000 Monte Carlo samples per decision' as part of the experimental setup. However, it does not provide explicit training, testing, or validation splits for these datasets. |
| Hardware Specification | Yes | Experiments were run on an Intel Xeon E5-1650 at 3.20Ghz. |
| Software Dependencies | No | The paper mentions several R packages used, such as 'rcdd', 'huge', 'sbgcop', and 'AER', and that 'All code was written in R'. However, it does not specify concrete version numbers for any of these software components or the R environment itself. |
| Experiment Setup | Yes | In the first batch, we set ϵx = ϵy = ϵw = 0.2, and β = 0.9, β = 1.1. In the second batch, we change parameters so that β = β = 1. We simulate 100 data sets for each one of the four cases (hard case/easy case, with theoretical solution/without theoretical solution), 5000 points per data set, 1000 Monte Carlo samples per decision. |