The Randomized Causation Coefficient

Authors: David Lopez-Paz, Krikamol Muandet, Benjamin Recht

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an array of experiments to test the effectiveness of a simple implementation of the presented causal learning framework1. Given the use of random embeddings (4) in our classifier, we term our method the Randomized Causation Coefficient (RCC). Throughout our simulations, we featurize each sample S = {(xi, yi)}n i=1 as ν(S) = (µk,m(PSx), µk,m(PSy), µk,m(PS)), (5) (...) Figure 1 plots the classification accuracy of RCC, IGCI (Daniusis et al., 2012), and ANM (Mooij et al., 2014) versus the fraction of decisions that the algorithms are forced to make out of the 82 scalar T uebingen cause-effect pairs. (...) We tested RCC at the Cha Learn s Fast Causation Coefficient challenge (Guyon, 2014).
Researcher Affiliation Academia David Lopez-Paz EMAIL Max-Planck-Institute for Intelligent Systems, Spemannstrasse 38, 72076 T ubingen, Germany Krikamol Muandet EMAIL Max-Planck-Institute for Intelligent Systems, Spemannstrasse 38, 72076 T ubingen, Germany Benjamin Recht EMAIL Department of EECS, University of California Berkeley, 387 Soda Hall, Berkeley, CA 94720
Pseudocode No No explicit pseudocode or algorithm block is present. The methodology is described using mathematical equations and textual explanations.
Open Source Code Yes 1. The source code of our experiments is available at https://github.com/lopezpaz/causation_learning_theory.
Open Datasets Yes 2. The T ubingen cause-effect pairs data set can be downloaded at https://webdav.tuebingen.mpg.de/ cause-effect/. We tested RCC at the Cha Learn s Fast Causation Coefficient challenge (Guyon, 2014). URL https://www.codalab.org/ competitions/1381.
Dataset Splits Yes Given the small size of this data set, we resort to the synthesis of some Mother distribution to sample our training data from. (...) we construct the synthetic training data {{ν({(ˆxij, ˆyij)}n j=1), +1)}N i=1, {ν({(ˆyij, ˆxij)}n j=1), 1)}N i=1}, where {(ˆxij, ˆyij)}n j=1 = ˆSi, and train our classifier on it. Figure 1 plots the classification accuracy of RCC, IGCI (Daniusis et al., 2012), and ANM (Mooij et al., 2014) versus the fraction of decisions that the algorithms are forced to make out of the 82 scalar T uebingen cause-effect pairs. (...) We trained a Gradient Boosting Classifier (GBC), with hyper-parameters chosen via a 4-fold cross validation, on the featurizations (5) of the training data.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies Yes To classify the embeddings (5) in each of the experiments, we use the random forest implementation from Python s sklearn-0.16-git.
Experiment Setup Yes In practice, we set m = 1000, and observe no significant improvements when using larger amounts of random features. To classify the embeddings (5) in each of the experiments, we use the random forest implementation from Python s sklearn-0.16-git. The number of trees forming the forest is chosen from the set {100, 250, 500, 1000, 5000}, via cross-validation. (...) Each of these three embeddings has random features sampled to approximate the sum of three Gaussian kernels (2) with hyper-parameters 0.1γ, γ, and 10γ, where γ is set using the median heuristic. (...) We trained a Gradient Boosting Classifier (GBC), with hyper-parameters chosen via a 4-fold cross validation