The Randomized Causation Coefficient
Authors: David Lopez-Paz, Krikamol Muandet, Benjamin Recht
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an array of experiments to test the effectiveness of a simple implementation of the presented causal learning framework1. Given the use of random embeddings (4) in our classifier, we term our method the Randomized Causation Coefficient (RCC). Throughout our simulations, we featurize each sample S = {(xi, yi)}n i=1 as ν(S) = (µk,m(PSx), µk,m(PSy), µk,m(PS)), (5) (...) Figure 1 plots the classification accuracy of RCC, IGCI (Daniusis et al., 2012), and ANM (Mooij et al., 2014) versus the fraction of decisions that the algorithms are forced to make out of the 82 scalar T uebingen cause-effect pairs. (...) We tested RCC at the Cha Learn s Fast Causation Coefficient challenge (Guyon, 2014). |
| Researcher Affiliation | Academia | David Lopez-Paz EMAIL Max-Planck-Institute for Intelligent Systems, Spemannstrasse 38, 72076 T ubingen, Germany Krikamol Muandet EMAIL Max-Planck-Institute for Intelligent Systems, Spemannstrasse 38, 72076 T ubingen, Germany Benjamin Recht EMAIL Department of EECS, University of California Berkeley, 387 Soda Hall, Berkeley, CA 94720 |
| Pseudocode | No | No explicit pseudocode or algorithm block is present. The methodology is described using mathematical equations and textual explanations. |
| Open Source Code | Yes | 1. The source code of our experiments is available at https://github.com/lopezpaz/causation_learning_theory. |
| Open Datasets | Yes | 2. The T ubingen cause-effect pairs data set can be downloaded at https://webdav.tuebingen.mpg.de/ cause-effect/. We tested RCC at the Cha Learn s Fast Causation Coefficient challenge (Guyon, 2014). URL https://www.codalab.org/ competitions/1381. |
| Dataset Splits | Yes | Given the small size of this data set, we resort to the synthesis of some Mother distribution to sample our training data from. (...) we construct the synthetic training data {{ν({(ˆxij, ˆyij)}n j=1), +1)}N i=1, {ν({(ˆyij, ˆxij)}n j=1), 1)}N i=1}, where {(ˆxij, ˆyij)}n j=1 = ˆSi, and train our classifier on it. Figure 1 plots the classification accuracy of RCC, IGCI (Daniusis et al., 2012), and ANM (Mooij et al., 2014) versus the fraction of decisions that the algorithms are forced to make out of the 82 scalar T uebingen cause-effect pairs. (...) We trained a Gradient Boosting Classifier (GBC), with hyper-parameters chosen via a 4-fold cross validation, on the featurizations (5) of the training data. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | Yes | To classify the embeddings (5) in each of the experiments, we use the random forest implementation from Python s sklearn-0.16-git. |
| Experiment Setup | Yes | In practice, we set m = 1000, and observe no significant improvements when using larger amounts of random features. To classify the embeddings (5) in each of the experiments, we use the random forest implementation from Python s sklearn-0.16-git. The number of trees forming the forest is chosen from the set {100, 250, 500, 1000, 5000}, via cross-validation. (...) Each of these three embeddings has random features sampled to approximate the sum of three Gaussian kernels (2) with hyper-parameters 0.1γ, γ, and 10γ, where γ is set using the median heuristic. (...) We trained a Gradient Boosting Classifier (GBC), with hyper-parameters chosen via a 4-fold cross validation |