reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Distribution Free Conditional Independence Test with Applications to Causal Discovery

Authors: Zhanrui Cai, Runze Li, Yaowu Zhang

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The eﬀectiveness of the method is illustrated through extensive simulations and a real application on causal discovery. We conduct numerical comparisons and apply the proposed test to causal discovery in directed acyclic graphs in Section 4.
Researcher Affiliation	Academia	Zhanrui Cai EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA, 15213, USA, Runze Li EMAIL Department of Statistics The Pennsylvania State University University Park, PA 16802, USA, Yaowu Zhang EMAIL Research Institute for Interdisciplinary Sciences School of Information Management and Engineering Shanghai University of Finance and Economics Shanghai, 200433, China
Pseudocode	Yes	In what follows, we describe the simulation-based procedure in detail to decide the critical value cα. 1. Generate {U i , V i , W i }, i = 1, . . . , n independently from mutually independent standard uniform distributions; 2. Compute the statistic bρ based on {U i , V i , W i }, i = 1, . . . , n, i.e., bρ = c0n 2 X n e \|U i U j \| + e U i + e U i 1 + e U j + e U j 1 + 2e 1 4 e \|V i V j \| + e V i + e V i 1 + e V j + e V j 1 + 2e 1 4 e \|W i W j \|o . (4) 3. Repeat Steps 1-2 for B times and set cα to be the upper α quantile of the estimated bρ obtained from the randomly simulated samples.
Open Source Code	No	The paper does not explicitly provide a link to source code or an affirmative statement about its release in the main body or appendices.
Open Datasets	Yes	We analyze a real data set originally from the National Institute of Diabetes and Digestive and Kidney Diseases (Smith et al., 1988).
Dataset Splits	No	The paper mentions the use of synthetic data generated for various models (M1-M18) and a real dataset with n=392 samples, but it does not specify any training/test/validation splits for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	We implement the causal algorithms by the R package pcalg (Kalisch et al., 2012). (No version number for `pcalg` or R itself is provided).
Experiment Setup	Yes	The sample size n is set to be 100. The simulated null distributions based on nbρ(X, Y \| Z) and nbρ0(X, Y \| Z) are depicted in Figure 1. To show the insensitivity of the choice of the bandwidth, we set the bandwidths to be ch0, where c = 0.5, 1, and 2, respectively, and h0 is the bandwidth obtained by the rule of thumb. The critical values of the CIT are obtained by conducting 1000 simulations. We conduct 500 replications for each scenario.