reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Doubly Robust Fusion of Many Treatments for Policy Learning

Authors: Ke Zhu, Jianing Chu, Ilya Lipkovich, Wenyu Ye, Shu Yang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation studies show superior group recovery and policy value compared to existing approaches. We illustrate the practical utility of our method using a nationwide electronic health record-derived deidentified database containing data from patients with Chronic Lymphocytic Leukemia and Small Lymphocytic Lymphoma.
Researcher Affiliation	Collaboration	1Department of Statistics, North Carolina State University, Raleigh, NC 27695, U.S.A. 2Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, U.S.A. 3Amazon (This work was done prior to joining Amazon) 4Eli Lilly & Company, Indianapolis, IN 46285, U.S.A..
Pseudocode	Yes	Algorithm 1: Calibration-Weighted Treatment Fusion Input: Data {(Xi, Ai, Yi)}n i=1. for a = 1, . . . , K do ... Algorithm 2: Cross-Fitted AIPW Policy Learning Input: Data {(Xi, Ai, Yi)}n i=1; Group mapping δ. Split the data into L folds. for l = 1, . . . , L do ...
Open Source Code	No	The paper does not provide an explicit statement or link for the availability of source code.
Open Datasets	No	The paper uses a "nationwide Flatiron Health electronic health record-derived deidentified database". While it cites related work on this database (Ma et al., 2020; Birnbaum et al., 2020), it does not provide concrete access information, a direct link, or a clear statement that this proprietary database is openly available for download.
Dataset Splits	Yes	Algorithm 2: Cross-Fitted AIPW Policy Learning... Split the data into L folds. for l = 1, . . . , L do...
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper mentions "R package policytree" but does not specify a version number. No other specific software versions are provided.
Experiment Setup	Yes	As a baseline, we implemented the CAIPWL method (Zhou et al., 2023) without calibration weighting or fusion, using the default tuning in the R package policytree to learn a depth-3 policy tree. ... In the fusion step, fused lasso uses extended BIC (Chen & Chen, 2008) for model selection. Treatments are grouped if the Euclidean distance between their fused lasso estimates is less than 0.25.