Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks
Authors: Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. |
| Researcher Affiliation | Academia | Joris M. Mooij EMAIL Institute for Informatics, University of Amsterdam Postbox 94323, 1090 GH Amsterdam, The Netherlands Jonas Peters EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany Dominik Janzing EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany Jakob Zscheischler EMAIL Institute for Atmospheric and Climate Science, ETH Z urich Universit atstrasse 16, 8092 Z urich, Switzerland Bernhard Sch olkopf EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany |
| Pseudocode | Yes | Algorithm 1 General procedure to decide whether p(x, y) satisfies an Additive Noise Model X Y or Y X. Algorithm 2 Procedure to decide whether p(x, y) satisfies an Additive Noise Model X Y or Y X suitable for empirical-Bayes or MML model selection. Algorithm 3 General procedure to decide whether PX,Y is generated by a deterministic monotonic bijective function from X to Y or from Y to X. |
| Open Source Code | Yes | In addition, all the code (including the code to run the experiments and create the figures) is provided both as an online appendix and on the first author s homepage2 under an open source license to allow others to reproduce and build on our work. 2. http://www.jorismooij.nl/ |
| Open Datasets | Yes | We present the benchmark Cause Effect Pairs that consists of data for 100 different causeeffect pairs selected from 37 data sets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the ground truth causal directions of all pairs... The Cause Effect Pairs benchmark data are provided on our website (Mooij et al., 2014). |
| Dataset Splits | Yes | Suppose we have two data sets, a training data set DN := {(xn, yn)}N n=1 (for estimating the function) and a test data set D N := {(x n, y n)}N n=1 (for testing independence of residuals), both consisting of i.i.d. samples distributed according to p(x, y). We will consider two scenarios: the data splitting scenario where training and test set are independent (typically achieved by splitting a bigger data set into two parts), and the data recycling scenario in which the training and test data are identical (where we use the same data twice for different purposes: regression and independence testing). |
| Hardware Specification | Yes | We used a machine with Intel Xeon CPU E5-2680 v2 @ 2.80GHz processors, 40 cores, and 125 GB of RAM. |
| Software Dependencies | Yes | We used Mat Lab on a Linux platform, and made use of external libraries GPML v3.5 (2014-12-08) (Rasmussen and Nickisch, 2010) for GP regression and ITE v0.61 (Szab o, 2014) for entropy estimation. For parallelization, we used the convenient command line tool GNU parallel (Tange, 2011). |
| Experiment Setup | Yes | Both variables X and Y were standardized (i.e., an affine transformation is applied on both variables such that their empirical mean becomes 0, and their empirical standard deviation becomes 1). In order to study the effect of discretization and other small perturbations of the data, one of these four perturbations was applied: unperturbed, discretized, undiscretized, small noise. We used a squared exponential covariance function, constant mean function, and an additive Gaussian noise likelihood. We used the FITC approximation... We found that 100 FITC points distributed on a linearly spaced grid greatly reduce computation time... Therefore, we used this setting as a default for the GP regression. |