reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mediation Analysis for Probabilities of Causation

Authors: Yuta Kawakami, Jin Tian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical application of our results through an analysis of a real-world psychology dataset. We perform numerical experiments to illustrate the properties of the estimators from finite sample size. The ground truths of T-PNS, ND-PNS, and NI-PNS are 0.074, 0.066, and 0.008. When N = 100, the estimates are T-PNS: 0.083 (CI: [0.000, 0.228]), ND-PNS: 0.074 (CI: [0.000, 0.220]), NI-PNS: 0.009 (CI: [0.000, 0.046]). When N = 1000, the estimates are T-PNS: 0.075 (CI: [0.029, 0.125]), ND-PNS: 0.068 (CI: [0.021, 0.116]), NI-PNS: 0.007 (CI: [0.000, 0.017]). When N = 10000, the estimates are T-PNS: 0.074 (CI: [0.060, 0.088]), ND-PNS: 0.067 (CI: [0.052, 0.082]), NI-PNS: 0.008 (CI: [0.005, 0.011]).
Researcher Affiliation	Academia	Yuta Kawakami, Jin Tian Mohamed bin Zayed University of Artificial Intelligence, UAE EMAIL
Pseudocode	No	The paper describes methods using mathematical formulas and propositions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions that the dataset is available through the R package mediation, but this refers to a third-party tool for data access, not their own implementation code.
Open Datasets	Yes	We take up a dataset from the Job Search Intervention Study (JOBS II) (Vinokur and Schul 1997). This dataset is open through the R package mediation (https: //cran.r-project.org/web/packages/mediation/index.html).
Dataset Splits	No	For the real-world psychology dataset (JOBS II), the paper states: "The sample size is 899 with no missing values." However, it does not provide any specific details about how this dataset was split into training, validation, or test sets for their application. For simulated experiments, it states "simulate 1000 times with the sample size N = 100, 1000, 10000" but this is not a dataset split for evaluation.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the simulated experiments or the analysis on the real-world dataset.
Software Dependencies	No	The paper mentions the use of an "R package mediation" for accessing the JOBS II dataset but does not specify a version number for this package or any other software dependencies used for their analysis or simulations.
Experiment Setup	Yes	For simulated experiments: "We assume the following SCM: X := Bern(0.5), M := Bern(π(X)), Y := Bern(π(X + M)), where π(x) = exp(1 + 0.5x)/(1 + exp(1 + 0.5x)). Bern(z) represents a Bernoulli distribution with probability z. X, M, and Y are all binary variables. We simulate 1000 times with the sample size N = 100, 1000, 10000, respectively, and assess the means and 95% confidential intervals (CI) of the estimators." For the real-world dataset: "We let the threshold of the depression be y = 3 in all the definitions of Po C variants, and let x = 0 and x = 1."