reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks

Authors: Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, Bernhard Schölkopf

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artiﬁcially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from eﬀect using only purely observational data, although more benchmark data would be needed to obtain statistically signiﬁcant conclusions.
Researcher Affiliation	Academia	Joris M. Mooij EMAIL Institute for Informatics, University of Amsterdam Postbox 94323, 1090 GH Amsterdam, The Netherlands Jonas Peters EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany Dominik Janzing EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany Jakob Zscheischler EMAIL Institute for Atmospheric and Climate Science, ETH Z urich Universit atstrasse 16, 8092 Z urich, Switzerland Bernhard Sch olkopf EMAIL Max Planck Institute for Intelligent Systems Spemannstraße 38, 72076 T ubingen, Germany
Pseudocode	Yes	Algorithm 1 General procedure to decide whether p(x, y) satisﬁes an Additive Noise Model X Y or Y X. Algorithm 2 Procedure to decide whether p(x, y) satisﬁes an Additive Noise Model X Y or Y X suitable for empirical-Bayes or MML model selection. Algorithm 3 General procedure to decide whether PX,Y is generated by a deterministic monotonic bijective function from X to Y or from Y to X.
Open Source Code	Yes	In addition, all the code (including the code to run the experiments and create the ﬁgures) is provided both as an online appendix and on the ﬁrst author s homepage2 under an open source license to allow others to reproduce and build on our work. 2. http://www.jorismooij.nl/
Open Datasets	Yes	We present the benchmark Cause Effect Pairs that consists of data for 100 diﬀerent causeeﬀect pairs selected from 37 data sets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the ground truth causal directions of all pairs... The Cause Effect Pairs benchmark data are provided on our website (Mooij et al., 2014).
Dataset Splits	Yes	Suppose we have two data sets, a training data set DN := {(xn, yn)}N n=1 (for estimating the function) and a test data set D N := {(x n, y n)}N n=1 (for testing independence of residuals), both consisting of i.i.d. samples distributed according to p(x, y). We will consider two scenarios: the data splitting scenario where training and test set are independent (typically achieved by splitting a bigger data set into two parts), and the data recycling scenario in which the training and test data are identical (where we use the same data twice for diﬀerent purposes: regression and independence testing).
Hardware Specification	Yes	We used a machine with Intel Xeon CPU E5-2680 v2 @ 2.80GHz processors, 40 cores, and 125 GB of RAM.
Software Dependencies	Yes	We used Mat Lab on a Linux platform, and made use of external libraries GPML v3.5 (2014-12-08) (Rasmussen and Nickisch, 2010) for GP regression and ITE v0.61 (Szab o, 2014) for entropy estimation. For parallelization, we used the convenient command line tool GNU parallel (Tange, 2011).
Experiment Setup	Yes	Both variables X and Y were standardized (i.e., an aﬃne transformation is applied on both variables such that their empirical mean becomes 0, and their empirical standard deviation becomes 1). In order to study the eﬀect of discretization and other small perturbations of the data, one of these four perturbations was applied: unperturbed, discretized, undiscretized, small noise. We used a squared exponential covariance function, constant mean function, and an additive Gaussian noise likelihood. We used the FITC approximation... We found that 100 FITC points distributed on a linearly spaced grid greatly reduce computation time... Therefore, we used this setting as a default for the GP regression.