reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency

Authors: Tetiana Gorbach, Xavier de Luna, Juha Karvanen, Ingeborg Waernbaum

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The theory is complemented with simulation experiments on the ﬁnite sample behavior of the estimators. ... 5. Simulation Studies This section presents two simulation studies. The ﬁrst simulation study compares the asymptotic behaviors of semiparametric plug-in estimators introduced in Section 4. The considered ACE estimators are consistent and reach the respective eﬃciency bounds because they are based on n-consistent estimators of the nuisance parameters. Therefore, we draw attention to the comparison of estimators empirical variances. In the second simulation study, we investigate the robustness of the estimators under model misspeciﬁcation.
Researcher Affiliation	Academia	Tetiana Gorbach EMAIL Xavier de Luna EMAIL Department of Statistics Ume a University 901 87 Ume a, Sweden Juha Karvanen EMAIL Department of Mathematics and Statistics P.O.Box 35 (Ma D) FI-40014 University of Jyvaskyla, Jyv askyl a, Finland Ingeborg Waernbaum EMAIL Department of Statistics Box 513, 751 20 Uppsala University, Uppsala, Sweden
Pseudocode	No	The paper describes methods using mathematical notation and theoretical derivations, but it does not include any clearly labeled pseudocode or algorithm blocks. For example, Section 4 discusses "Semiparametric Estimation via Estimating Equation Estimators" but only describes the form of the estimating equation without presenting a structured algorithm.
Open Source Code	Yes	All computations were performed in R (R Core Team, 2020); the code is available at https://github.com/tetianagorbach/semiparametric inference ACE BD FD TD effic iency robustness.
Open Datasets	No	The observed data is generated according to consistency assumptions as follows: C Bernoulli(0.5), A\|C Bernoulli(expit(C)), Z\|A, C N(βA, 1), Y \|Z, A, C N(γ1Z + γ2C, 1). The paper describes how data was generated for simulations, not the use of external publicly available datasets with access information.
Dataset Splits	No	We consider samples of sizes n = 50, 100, 500, 1000, 5000, 10000, 20000, 30000, 40000, 50000. For each sample size, K = 1000 replicates were simulated. This describes the sampling strategy for simulations, not specific training/test/validation splits for an existing dataset.
Hardware Specification	No	This research was conducted using the resources of High Performance Computing Center North (HPC2N). This mentions a computing resource but lacks specific hardware details such as CPU/GPU models or memory.
Software Dependencies	Yes	All computations were performed in R (R Core Team, 2020)
Experiment Setup	Yes	The observed data is generated according to consistency assumptions as follows: C Bernoulli(0.5), A\|C Bernoulli(expit(C)), Z\|A, C N(βA, 1), Y \|Z, A, C N(γ1Z + γ2C, 1). ... Here, we vary the eﬀect of the mediator Z and covariate C on the outcome by considering eight data generating mechanisms corresponding to all combinations of β, γ1, γ2 {0.5, 1.5}. ... the parameters of all outcome models, E(Y \|A, Z, C), E(Y \|Z, C), and E(Y \|A, Z), were estimated using ordinary least squares. Parameter β in the mediator model was also estimated using ordinary least squares. The density of a normal distribution was used in the estimation of p(Z\|A). Further, p(A\|C) was estimated using iteratively reweighted least squares method in the corresponding logistic regression, and p(A = 1) and p(C = 1) were consistently estimated using the proportion of A = 1 and C = 1, respectively.