reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence

Authors: Joseph Paillard, Angel David Reyero Lobo, Vitaliy Kolodyazhniy, Bertrand Thirion, Denis-Alexander Engemann

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the benefits of Permu CATE in simulated and real-world health datasets, including settings with up to hundreds of correlated variables.
Researcher Affiliation	Collaboration	1Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Basel, Switzerland; 2Universit e Paris-Saclay, Inria, CEA, Palaiseau, France; 3Institut de Mathematiques de Toulouse, UMR5219 Universit e de Toulouse, France.
Pseudocode	Yes	Algorithm 1 Conditional Permutation Importance for CATE
Open Source Code	No	The paper does not provide explicit statements or links indicating that source code for the described methodology is publicly available. While it mentions 'All proofs and additional experiments are given in appendix,' this does not include code.
Open Datasets	Yes	The Infant Health and Development Program (IHDP). The dataset consists 747 subjects with 25 real covariates, including 6 continuous and 19 binary variables, along with a simulated outcome that is both non-linear and noisy (Shalit et al., 2017).
Dataset Splits	Yes	The importance of variables was estimated using a nested cross-fitting scheme. In each split split, 20% of the data was left out for the importance estimation. The remaining 80% was used to fit the DR-learner using the cross-validation scheme presented in the work from Kennedy (2023). We used a five-fold cross-fitting strategy for both loops.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	For linear models, we used the scikitlearn implementation Ridge CV for regression and Logistic Regression CV... For the gradient boosting tree we used the implementations Hist Gradient Boosting Classifier and Hist Gradient Boosting Regressor... Causal Forest (CF) with 100 trees (Athey & Wager, 2019). Specific version numbers for these software components are not provided.
Experiment Setup	Yes	To estimate the CATE, we used a DR-learner (Kennedy, 2023) with regularized linear models for nuisances functions and the final regression step. For Permu CATE, we used the same regularized linear model for covariate prediction and used 50 permutations... For the gradient boosting tree we used the implementations Hist Gradient Boosting Classifier and Hist Gradient Boosting Regressor respectively for regression and classification. After using a randomized search for hyper-parameters we used a learning rate of 0.1 (range explored: [10 3, 103]) and a maximum number of leaves for each tree of 10 (range explored: [10, 100]).