reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources

Authors: Vibhhu Sharma, Bryan Wilder

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we use data from 5 real-world RCTs in a variety of domains to empirically assess such choices. We find that when treatment effects can be estimated with high accuracy (which we simulate by allowing the model to partially observe outcomes in advance), treatment effect based targeting substantially outperforms risk-based targeting, even when treatment effect estimates are biased. Moreover, these results hold even when the policymaker has strong normative preferences for assisting higher-risk individuals.
Researcher Affiliation	Academia	Vibhhu Sharma & Bryan Wilder Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15232, USA EMAIL
Pseudocode	No	The paper describes methods and procedures in narrative text and mathematical formulas (e.g., Section 3.2, 3.3, Appendix A.3) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Reproducibility Statement: The supplementary material includes code including data preprocessing and experimentation for each of the datasets. We also detail our procedures in the Appendix A (dataset details) and in Section 4 (step by step experimental procedure).
Open Datasets	Yes	We conduct experiments on a variety of RCTs across different domains as detailed below: Targeting the Ultra Poor (TUP) in India ((Banerjee et al., 2021)): ... NSW (National Supported Work demonstration) Dataset ((Dehejia & Wahba, 1999; 2002; La Londe, 1986): ... Postoperative Pain Dataset: Patients undergoing operations like tracheal intubations often experience throat pain following treatment (Mchardy & Chung, 1999). ... Acupuncture Dataset: (Vickers et al., 2004) ... Tennessee s Student Teacher Achievement Ratio (STAR) project (Achilles et al., 2008):
Dataset Splits	No	A.1 EXPERIMENT DETAILS: Real Setting: We divide the RCT data into two splits such that one split is used for training nuisance functions and the other split is used entirely for evaluation. Semi-synthetic Setting: We divide the RCT into two splits such that we use each split to obtain treatment effect estimates for the other split and make maximal use of available data. While the paper mentions dividing data into 'two splits', it does not provide specific percentages, sample counts, or reproducible details for these splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU/CPU models, memory, or cloud resources.
Software Dependencies	No	The paper mentions using a 'random forest regressor' and 'kernel regression method' and 'doubly-robust estimator' but does not specify any software libraries (e.g., scikit-learn, PyTorch) or their version numbers.
Experiment Setup	No	The paper describes the general methodology for estimating treatment effects and baseline risk, including the use of doubly-robust estimators, random forest regressors, and kernel regression. It also describes how confounding is introduced and different welfare functions. However, it lacks specific hyperparameters for the machine learning models (e.g., number of trees in random forest, learning rates, kernel parameters) that would be necessary for exact reproduction of the experiments.