Structure Learning for Directed Trees
Authors: Martin E. Jakobsen, Rajen D. Shah, Peter Bühlmann, Jonas Peters
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods. In this section, we investigate the finite-sample performance of CAT and perform simulation experiments investigating the identifiability gap and its lower bound. |
| Researcher Affiliation | Academia | Martin Emil Jakobsen EMAIL Department of Mathematical Sciences University of Copenhagen Copenhagen, Denmark; Rajen D. Shah EMAIL Statistical Laboratory University of Cambridge Cambridge, UK; Peter B uhlmann EMAIL Seminar for Statistics ETH Zurich Zurich, Switzerland; Jonas Peters EMAIL Department of Mathematical Sciences University of Copenhagen Copenhagen, Denmark |
| Pseudocode | Yes | Algorithm 1 Causal additive trees (CAT) Algorithm 2 Hypothesis testing of H0(R) using the Check C test Algorithm 3 Hypothesis testing of H0(R) using the Conv B test |
| Open Source Code | Yes | An R implementation of CAT with options for cross-fitting and pruning is available on Git Hub.8 Footnote 8: https://github.com/Martin Emil Jakobsen/CAT |
| Open Datasets | Yes | We consider the well-known non-synthetic bio-informatics data set considered by Sachs et al. (2005). |
| Dataset Splits | Yes | Given a finite sample of size n we use the first n/2 observations to estimate all possible conditional mean functions... The remaining n n/2 observations are used to estimate the upper and lower Bonferroni corrected confidence bounds ˆl = (ˆlji)j =i and ˆu = (ˆuji)j =i as defined in Equation (10) of Section 4. |
| Hardware Specification | No | The paper discusses runtime performance ("The average runtime of CAM and CAT.G in this experiment for p = 64 and n = 500 was 288 and 199 seconds, respectively.") but does not specify the particular hardware (e.g., CPU, GPU models, memory) on which these experiments were run. |
| Software Dependencies | Yes | We use the R-package mgcv (Mixed GAM Computation Vehicle, Wood, 2022) with default settings... We use the implementation of Chu Liu Edmonds algorithm from the R-package RBGL (Carey et al., 2021) and the Python implementation of Edmonds version from the Python-package Network X (Hagberg et al., 2022). The entropy edge weights used by CAT.E are estimated with the differential entropy estimator of Berrett et al. (2019) as implemented in the CRAN R-package Indep Test (Berrett et al., 2018). |
| Experiment Setup | Yes | For any given directed tree we generate causal functions by sample paths of Gaussian processes with radial basis function (RBF) kernel and bandwidth parameter of one. Root nodes are mean zero Gaussian variables with standard deviation sampled uniformly on (1, 2). Furthermore, for each fixed tree and set of causal functions, we introduce at each non-root node additive Gaussian noise with mean zero and standard deviation sampled uniformly on (1/5, 2/5). ...We conduct the experiment for all combinations of α {0.1, 0.2, . . . , 2, 2.5, 3, 3.5, 4} and sample sizes n {50, 500} for a fixed system size of p = 32. ...The structure learning methods are applied to observational data (853 observations using reagents anti-CD3 and anti-CD28). |