Mediation Analysis for Probabilities of Causation
Authors: Yuta Kawakami, Jin Tian
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the practical application of our results through an analysis of a real-world psychology dataset. We perform numerical experiments to illustrate the properties of the estimators from finite sample size. The ground truths of T-PNS, ND-PNS, and NI-PNS are 0.074, 0.066, and 0.008. When N = 100, the estimates are T-PNS: 0.083 (CI: [0.000, 0.228]), ND-PNS: 0.074 (CI: [0.000, 0.220]), NI-PNS: 0.009 (CI: [0.000, 0.046]). When N = 1000, the estimates are T-PNS: 0.075 (CI: [0.029, 0.125]), ND-PNS: 0.068 (CI: [0.021, 0.116]), NI-PNS: 0.007 (CI: [0.000, 0.017]). When N = 10000, the estimates are T-PNS: 0.074 (CI: [0.060, 0.088]), ND-PNS: 0.067 (CI: [0.052, 0.082]), NI-PNS: 0.008 (CI: [0.005, 0.011]). |
| Researcher Affiliation | Academia | Yuta Kawakami, Jin Tian Mohamed bin Zayed University of Artificial Intelligence, UAE EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulas and propositions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It mentions that the dataset is available through the R package mediation, but this refers to a third-party tool for data access, not their own implementation code. |
| Open Datasets | Yes | We take up a dataset from the Job Search Intervention Study (JOBS II) (Vinokur and Schul 1997). This dataset is open through the R package mediation (https: //cran.r-project.org/web/packages/mediation/index.html). |
| Dataset Splits | No | For the real-world psychology dataset (JOBS II), the paper states: "The sample size is 899 with no missing values." However, it does not provide any specific details about how this dataset was split into training, validation, or test sets for their application. For simulated experiments, it states "simulate 1000 times with the sample size N = 100, 1000, 10000" but this is not a dataset split for evaluation. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the simulated experiments or the analysis on the real-world dataset. |
| Software Dependencies | No | The paper mentions the use of an "R package mediation" for accessing the JOBS II dataset but does not specify a version number for this package or any other software dependencies used for their analysis or simulations. |
| Experiment Setup | Yes | For simulated experiments: "We assume the following SCM: X := Bern(0.5), M := Bern(π(X)), Y := Bern(π(X + M)), where π(x) = exp(1 + 0.5x)/(1 + exp(1 + 0.5x)). Bern(z) represents a Bernoulli distribution with probability z. X, M, and Y are all binary variables. We simulate 1000 times with the sample size N = 100, 1000, 10000, respectively, and assess the means and 95% confidential intervals (CI) of the estimators." For the real-world dataset: "We let the threshold of the depression be y = 3 in all the definitions of Po C variants, and let x = 0 and x = 1." |