reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantifying Treatment Effects: Estimating Risk Ratios via Observational Studies

Authors: Ahmed Boughdiri, Julie Josse, Erwan Scornet

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through analyses on simulated and real-world datasets, we evaluate the performance of these estimators in terms of bias, efficiency, and robustness to generative data models. We also examine the coverage and length of the associated confidence intervals. ... In Section 4, we evaluate all estimators on observational data, and study the empirical properties of confidence intervals in terms of coverage and lengths. In Section 5, we extend our analysis to a semi-synthetic and a real-world dataset.
Researcher Affiliation	Academia	1INRIA Sophia-Antipolis 2Sorbonne Universit e and Universit e Paris Cit e. Correspondence to: Ahmed Boughdiri <EMAIL>.
Pseudocode	No	The paper describes methods mathematically and textually but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not provide any statement or link indicating the release of source code for the methodology described.
Open Datasets	Yes	To better illustrate the practical application and behavior of our estimators, we include a real-world study from Mayer et al. (2020) involving 8,270 patients with traumatic brain injury (TBI), using data extracted from the Traumabase.
Dataset Splits	No	The paper mentions generating datasets for simulations and subsampling real data for different sample sizes, but it does not provide specific training/test/validation splits with percentages, counts, or references to predefined splits for reproduction.
Hardware Specification	Yes	All our experiments were run on a 8GB M1 Mac.
Software Dependencies	No	For the simulations we have implemented all estimators in Python using Scikit-Learn for our regression and classification models. While Python and Scikit-Learn are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup	No	The paper mentions estimating nuisance components via parametric (linear/logistic regression) or non-parametric methods (random forests) and using a 'high regularization parameter' or 'parameters determined by the training data size'. However, it does not provide concrete hyperparameter values (e.g., learning rates, batch sizes, specific regularization strengths) or detailed optimizer settings in the main text.