Structure Learning for Directed Trees

Authors: Martin E. Jakobsen, Rajen D. Shah, Peter Bühlmann, Jonas Peters

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods. In this section, we investigate the finite-sample performance of CAT and perform simulation experiments investigating the identifiability gap and its lower bound.
Researcher Affiliation Academia Martin Emil Jakobsen EMAIL Department of Mathematical Sciences University of Copenhagen Copenhagen, Denmark; Rajen D. Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, UK; Peter B uhlmann EMAIL Seminar for Statistics ETH Zurich Zurich, Switzerland; Jonas Peters EMAIL Department of Mathematical Sciences University of Copenhagen Copenhagen, Denmark
Pseudocode Yes Algorithm 1 Causal additive trees (CAT) Algorithm 2 Hypothesis testing of H0(R) using the Check C test Algorithm 3 Hypothesis testing of H0(R) using the Conv B test
Open Source Code Yes An R implementation of CAT with options for cross-fitting and pruning is available on Git Hub.8 Footnote 8: https://github.com/Martin Emil Jakobsen/CAT
Open Datasets Yes We consider the well-known non-synthetic bio-informatics data set considered by Sachs et al. (2005).
Dataset Splits Yes Given a finite sample of size n we use the first n/2 observations to estimate all possible conditional mean functions... The remaining n n/2 observations are used to estimate the upper and lower Bonferroni corrected confidence bounds ˆl = (ˆlji)j =i and ˆu = (ˆuji)j =i as defined in Equation (10) of Section 4.
Hardware Specification No The paper discusses runtime performance ("The average runtime of CAM and CAT.G in this experiment for p = 64 and n = 500 was 288 and 199 seconds, respectively.") but does not specify the particular hardware (e.g., CPU, GPU models, memory) on which these experiments were run.
Software Dependencies Yes We use the R-package mgcv (Mixed GAM Computation Vehicle, Wood, 2022) with default settings... We use the implementation of Chu Liu Edmonds algorithm from the R-package RBGL (Carey et al., 2021) and the Python implementation of Edmonds version from the Python-package Network X (Hagberg et al., 2022). The entropy edge weights used by CAT.E are estimated with the differential entropy estimator of Berrett et al. (2019) as implemented in the CRAN R-package Indep Test (Berrett et al., 2018).
Experiment Setup Yes For any given directed tree we generate causal functions by sample paths of Gaussian processes with radial basis function (RBF) kernel and bandwidth parameter of one. Root nodes are mean zero Gaussian variables with standard deviation sampled uniformly on (1, 2). Furthermore, for each fixed tree and set of causal functions, we introduce at each non-root node additive Gaussian noise with mean zero and standard deviation sampled uniformly on (1/5, 2/5). ...We conduct the experiment for all combinations of α {0.1, 0.2, . . . , 2, 2.5, 3, 3.5, 4} and sample sizes n {50, 500} for a fixed system size of p = 32. ...The structure learning methods are applied to observational data (853 observations using reagents anti-CD3 and anti-CD28).