Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency

Authors: Tetiana Gorbach, Xavier de Luna, Juha Karvanen, Ingeborg Waernbaum

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theory is complemented with simulation experiments on the finite sample behavior of the estimators. ... 5. Simulation Studies This section presents two simulation studies. The first simulation study compares the asymptotic behaviors of semiparametric plug-in estimators introduced in Section 4. The considered ACE estimators are consistent and reach the respective efficiency bounds because they are based on n-consistent estimators of the nuisance parameters. Therefore, we draw attention to the comparison of estimators empirical variances. In the second simulation study, we investigate the robustness of the estimators under model misspecification.
Researcher Affiliation Academia Tetiana Gorbach EMAIL Xavier de Luna EMAIL Department of Statistics Ume a University 901 87 Ume a, Sweden Juha Karvanen EMAIL Department of Mathematics and Statistics P.O.Box 35 (Ma D) FI-40014 University of Jyvaskyla, Jyv askyl a, Finland Ingeborg Waernbaum EMAIL Department of Statistics Box 513, 751 20 Uppsala University, Uppsala, Sweden
Pseudocode No The paper describes methods using mathematical notation and theoretical derivations, but it does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 4 discusses "Semiparametric Estimation via Estimating Equation Estimators" but only describes the form of the estimating equation without presenting a structured algorithm.
Open Source Code Yes All computations were performed in R (R Core Team, 2020); the code is available at https://github.com/tetianagorbach/semiparametric inference ACE BD FD TD effic iency robustness.
Open Datasets No The observed data is generated according to consistency assumptions as follows: C Bernoulli(0.5), A|C Bernoulli(expit(C)), Z|A, C N(βA, 1), Y |Z, A, C N(γ1Z + γ2C, 1). The paper describes how data was generated for simulations, not the use of external publicly available datasets with access information.
Dataset Splits No We consider samples of sizes n = 50, 100, 500, 1000, 5000, 10000, 20000, 30000, 40000, 50000. For each sample size, K = 1000 replicates were simulated. This describes the sampling strategy for simulations, not specific training/test/validation splits for an existing dataset.
Hardware Specification No This research was conducted using the resources of High Performance Computing Center North (HPC2N). This mentions a computing resource but lacks specific hardware details such as CPU/GPU models or memory.
Software Dependencies Yes All computations were performed in R (R Core Team, 2020)
Experiment Setup Yes The observed data is generated according to consistency assumptions as follows: C Bernoulli(0.5), A|C Bernoulli(expit(C)), Z|A, C N(βA, 1), Y |Z, A, C N(γ1Z + γ2C, 1). ... Here, we vary the effect of the mediator Z and covariate C on the outcome by considering eight data generating mechanisms corresponding to all combinations of β, γ1, γ2 {0.5, 1.5}. ... the parameters of all outcome models, E(Y |A, Z, C), E(Y |Z, C), and E(Y |A, Z), were estimated using ordinary least squares. Parameter β in the mediator model was also estimated using ordinary least squares. The density of a normal distribution was used in the estimation of p(Z|A). Further, p(A|C) was estimated using iteratively reweighted least squares method in the corresponding logistic regression, and p(A = 1) and p(C = 1) were consistently estimated using the proportion of A = 1 and C = 1, respectively.