reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Proximal ID Algorithm

Authors: Ilya Shpitser, Zach Wood-Doughty, Eric J. Tchetgen Tchetgen

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate our approach by simulation studies and a data application. ... 7. Simulations ... We now turn to an array of simulation studies to demonstrate how the identifying assumptions of the proximal ID algorithm can enable unbiased estimation. ... 8. Analysis Of The Effect Of Methotextrate ... We now apply our proximal front-door estimator to an analysis of the effect of methotextrate (MTX) on tender joint count in patients with rheumatoid arthritis.
Researcher Affiliation	Academia	Ilya Shpitser EMAIL Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA; Zach Wood-Doughty EMAIL Department of Computer Science Northwestern University Evanston, IL 60208, USA; Eric J. Tchetgen Tchetgen EMAIL Department of Statistics The Wharton School 3620 Locust Walk, Philadelphia, PA 19104, USA
Pseudocode	No	The paper describes the 'ID algorithm' and 'proximal ID algorithm' conceptually, explaining their steps and operations (e.g., 'fixing operator φ'), and provides mathematical formulas. However, it does not present these algorithms in a structured pseudocode block or a clearly labeled algorithm environment. The description is primarily prose and mathematical notation.
Open Source Code	Yes	Our code implementing our methods and generating our datasets may be found in the following online repository: https://github.com/zachwooddoughty/proximal_id_algorithm. ... Code for preprocessing the data and reproducing these results are provided in the following repository: https://github.com/zachwooddoughty/proximal_id_algorithm.
Open Datasets	No	The dataset we examine, originally described in Choi et al. (2002), has been studied in several analyses (Fewell et al., 2004; Whittle and Hughes, 2004). ... Access to the dataset itself may be requested by contacting the third author.
Dataset Splits	No	The paper mentions generating synthetic datasets (e.g., 'we sample 64 datasets from each of four DGPs', 'each dataset contains 4000 samples') and using bootstrap resampling ('nonparametric bootstrap with 64 resamplings'). For the real-world application, it specifies patient counts ('1,010 patients'). However, it does not provide specific training, validation, or test splits (percentages or counts) for any of these datasets.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory specifications, or cloud computing instances) used for running its simulations or data application.
Software Dependencies	No	The paper does not specify any particular software libraries, frameworks, or tools with their version numbers that were used for implementing the methods or running the experiments.
Experiment Setup	Yes	First, we estimate a propensity score model for M, p(M \| A, Z, C). Using this model to weight... we estimate this function using generalized method of moments (GMM). ... We truncate weights at the 2.5th and 97.5th percentiles. ... we sample 100 trajectories of Y (a = 1) and Y (a = 0) ... For each of the four DGPs we consider, we modify the parameters of the sampling distribution by changing the A Y coefficient to a value βAY {0, 0.2, 0.4, 0.8}. For each value βAY , we sample 256 datasets of 4000 samples. ... nonparametric bootstrap with 64 resamplings to produce a 95% confidence interval.