A Distribution Free Conditional Independence Test with Applications to Causal Discovery

Authors: Zhanrui Cai, Runze Li, Yaowu Zhang

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of the method is illustrated through extensive simulations and a real application on causal discovery. We conduct numerical comparisons and apply the proposed test to causal discovery in directed acyclic graphs in Section 4.
Researcher Affiliation Academia Zhanrui Cai EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA, 15213, USA, Runze Li EMAIL Department of Statistics The Pennsylvania State University University Park, PA 16802, USA, Yaowu Zhang EMAIL Research Institute for Interdisciplinary Sciences School of Information Management and Engineering Shanghai University of Finance and Economics Shanghai, 200433, China
Pseudocode Yes In what follows, we describe the simulation-based procedure in detail to decide the critical value cα. 1. Generate {U i , V i , W i }, i = 1, . . . , n independently from mutually independent standard uniform distributions; 2. Compute the statistic bρ based on {U i , V i , W i }, i = 1, . . . , n, i.e., bρ = c0n 2 X n e |U i U j | + e U i + e U i 1 + e U j + e U j 1 + 2e 1 4 e |V i V j | + e V i + e V i 1 + e V j + e V j 1 + 2e 1 4 e |W i W j |o . (4) 3. Repeat Steps 1-2 for B times and set cα to be the upper α quantile of the estimated bρ obtained from the randomly simulated samples.
Open Source Code No The paper does not explicitly provide a link to source code or an affirmative statement about its release in the main body or appendices.
Open Datasets Yes We analyze a real data set originally from the National Institute of Diabetes and Digestive and Kidney Diseases (Smith et al., 1988).
Dataset Splits No The paper mentions the use of synthetic data generated for various models (M1-M18) and a real dataset with n=392 samples, but it does not specify any training/test/validation splits for these datasets.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No We implement the causal algorithms by the R package pcalg (Kalisch et al., 2012). (No version number for `pcalg` or R itself is provided).
Experiment Setup Yes The sample size n is set to be 100. The simulated null distributions based on nbρ(X, Y | Z) and nbρ0(X, Y | Z) are depicted in Figure 1. To show the insensitivity of the choice of the bandwidth, we set the bandwidths to be ch0, where c = 0.5, 1, and 2, respectively, and h0 is the bandwidth obtained by the rule of thumb. The critical values of the CIT are obtained by conducting 1000 simulations. We conduct 500 replications for each scenario.