A Distribution Free Conditional Independence Test with Applications to Causal Discovery
Authors: Zhanrui Cai, Runze Li, Yaowu Zhang
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of the method is illustrated through extensive simulations and a real application on causal discovery. We conduct numerical comparisons and apply the proposed test to causal discovery in directed acyclic graphs in Section 4. |
| Researcher Affiliation | Academia | Zhanrui Cai EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA, 15213, USA, Runze Li EMAIL Department of Statistics The Pennsylvania State University University Park, PA 16802, USA, Yaowu Zhang EMAIL Research Institute for Interdisciplinary Sciences School of Information Management and Engineering Shanghai University of Finance and Economics Shanghai, 200433, China |
| Pseudocode | Yes | In what follows, we describe the simulation-based procedure in detail to decide the critical value cα. 1. Generate {U i , V i , W i }, i = 1, . . . , n independently from mutually independent standard uniform distributions; 2. Compute the statistic bρ based on {U i , V i , W i }, i = 1, . . . , n, i.e., bρ = c0n 2 X n e |U i U j | + e U i + e U i 1 + e U j + e U j 1 + 2e 1 4 e |V i V j | + e V i + e V i 1 + e V j + e V j 1 + 2e 1 4 e |W i W j |o . (4) 3. Repeat Steps 1-2 for B times and set cα to be the upper α quantile of the estimated bρ obtained from the randomly simulated samples. |
| Open Source Code | No | The paper does not explicitly provide a link to source code or an affirmative statement about its release in the main body or appendices. |
| Open Datasets | Yes | We analyze a real data set originally from the National Institute of Diabetes and Digestive and Kidney Diseases (Smith et al., 1988). |
| Dataset Splits | No | The paper mentions the use of synthetic data generated for various models (M1-M18) and a real dataset with n=392 samples, but it does not specify any training/test/validation splits for these datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | We implement the causal algorithms by the R package pcalg (Kalisch et al., 2012). (No version number for `pcalg` or R itself is provided). |
| Experiment Setup | Yes | The sample size n is set to be 100. The simulated null distributions based on nbρ(X, Y | Z) and nbρ0(X, Y | Z) are depicted in Figure 1. To show the insensitivity of the choice of the bandwidth, we set the bandwidths to be ch0, where c = 0.5, 1, and 2, respectively, and h0 is the bandwidth obtained by the rule of thumb. The critical values of the CIT are obtained by conducting 1000 simulations. We conduct 500 replications for each scenario. |