Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Quantifying Treatment Effects: Estimating Risk Ratios via Observational Studies

Authors: Ahmed Boughdiri, Julie Josse, Erwan Scornet

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through analyses on simulated and real-world datasets, we evaluate the performance of these estimators in terms of bias, efficiency, and robustness to generative data models. We also examine the coverage and length of the associated confidence intervals. ... In Section 4, we evaluate all estimators on observational data, and study the empirical properties of confidence intervals in terms of coverage and lengths. In Section 5, we extend our analysis to a semi-synthetic and a real-world dataset.
Researcher Affiliation Academia 1INRIA Sophia-Antipolis 2Sorbonne Universit e and Universit e Paris Cit e. Correspondence to: Ahmed Boughdiri <EMAIL>.
Pseudocode No The paper describes methods mathematically and textually but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper does not provide any statement or link indicating the release of source code for the methodology described.
Open Datasets Yes To better illustrate the practical application and behavior of our estimators, we include a real-world study from Mayer et al. (2020) involving 8,270 patients with traumatic brain injury (TBI), using data extracted from the Traumabase.
Dataset Splits No The paper mentions generating datasets for simulations and subsampling real data for different sample sizes, but it does not provide specific training/test/validation splits with percentages, counts, or references to predefined splits for reproduction.
Hardware Specification Yes All our experiments were run on a 8GB M1 Mac.
Software Dependencies No For the simulations we have implemented all estimators in Python using Scikit-Learn for our regression and classification models. While Python and Scikit-Learn are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup No The paper mentions estimating nuisance components via parametric (linear/logistic regression) or non-parametric methods (random forests) and using a 'high regularization parameter' or 'parameters determined by the training data size'. However, it does not provide concrete hyperparameter values (e.g., learning rates, batch sizes, specific regularization strengths) or detailed optimizer settings in the main text.