reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Estimation of Partial Dependence Functions using Trees

Authors: Jinyang Liu, Tessa Steensgaard, Marvin N. Wright, Niklas Pfister, Munir Hiabu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider a supervised learning setup where training data Dtrain n := {(Y (i), X(i))}i [n] are iid sampled from a distribution P with correlated features X(i). For all experiments, an XGBoost estimator, ˆm, was trained on Dtrain n and subsequently explained using the same training samples as background data. Specific simulation settings varied depending on the analysis. For the inconsistency and MSE analyses (Figures 2, 3 and 5), we used d = 2 covariates sampled from a bivariate Gaussian distribution with correlation 0.3 and variance of 1. XGBoost hyperparameters (nrounds {1, . . . , 1000}, eta [0.01, 0.3], max depth {2, . . . , 6}) were tuned via 5-fold cross-validation with 50 random search evaluations. For the runtime comparison (Figure 4), we used d = 7 covariates sampled from a multivariate Gaussian distribution (details in Appendix) and a single fixed XGBoost model configuration with 20 trees and a maximum depth of D = 5. All other hyperparameters were left as default, except for eta, which was drawn uniformly from [0.01, 0.3]. All simulations were conducted on a dedicated compute cluster (2 Intel Xeon Gold 6230 @ 2.1 GHz CPUs, 192 GB RAM). Our implementation of the Fast PD algorithm is available as an R package on Git Hub . Inconsistency of Tree SHAP-path Figure 2 illustrates the SHAP explanations of ˆm for X1 in 500 observations of (Y, X) R R2. Comparison of Computational Runtime Figure 4 compares the runtime of extracting the PD functions for all S using Fast PD with computing the interventional SHAP values as implemented in the SHAP Python package. We have conducted experiments on a significant number of curated datasets in both regression and classification settings. We used the Open ML-CTR23 (Fischer et al., 2023) regression datasets and the Open ML-CC18 (Bischl et al., 2021) classification datasets.
Researcher Affiliation	Collaboration	1Department of Mathematical Sciences, University of Copenhagen, Denmark 2Faculty of Mathematics and Computer Science, Universit at Bremen, Germany 3Leibniz Institute for Prevention Research and Epidemiology BIPS, Germnay 4Department of Public Health, University of Copenhagen, Denmark 5Lakera, Switzerland. Correspondence to: Jinyang Liu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Fast PD augmentation step. Nodes are indexed by j, where lj and rj represent the indices of the left and right child nodes, respectively. The feature used to split at node j is denoted by dj, and tj is the split threshold and vj the value. Algorithm 2 Fast PD evaluation step to calculate ˆv S(x S). To be applied after augmenting the tree as in Algorithm 1.
Open Source Code	Yes	The implementation is available as an R-package on Git Hub: https://github.com/Planted ML/glex Our implementation of the Fast PD algorithm is available as an R package on Git Hub .
Open Datasets	Yes	We used the Open ML-CTR23 (Fischer et al., 2023) regression datasets and the Open ML-CC18 (Bischl et al., 2021) classification datasets. A particularly illustrative additional example is the adult dataset which contains data on whether an individual s income exceeds $50, 000 per year (Becker & Kohavi, 1996).
Dataset Splits	Yes	XGBoost hyperparameters (nrounds {1, . . . , 1000}, eta [0.01, 0.3], max depth {2, . . . , 6}) were tuned via 5-fold cross-validation with 50 random search evaluations.
Hardware Specification	Yes	All simulations were conducted on a dedicated compute cluster (2 Intel Xeon Gold 6230 @ 2.1 GHz CPUs, 192 GB RAM). All numerical experiments were conducted using R-4.4.1 or Python-3.12 on a dedicated cluster with 2 Intel Xeon Gold 6302@2.1 GHz CPUs and 192 GB of memory.
Software Dependencies	Yes	All numerical experiments were conducted using R-4.4.1 or Python-3.12 on a dedicated cluster with 2 Intel Xeon Gold 6302@2.1 GHz CPUs and 192 GB of memory. We modified the existing R package glex to compute the PD functions using Fast PD and the path-dependent algorithm which is due to (Friedman, 2001) but also reproduced as Algorithm 1 in (Lundberg et al., 2020). For computing the SHAP values using Tree SHAP-int: We used Python-3.12 and modified the shap package to compute the SHAP explanations for all features using arbitrary many background samples. For computing the SHAP values using Zern et al. (2023): We used Python 3.12 and the pltreeshap package \|\| to compute the SHAP explanations for all features using arbitrary many background samples
Experiment Setup	Yes	XGBoost hyperparameters (nrounds {1, . . . , 1000}, eta [0.01, 0.3], max depth {2, . . . , 6}) were tuned via 5-fold cross-validation with 50 random search evaluations. A single fixed XGBoost model configuration with 20 trees and a maximum depth of D = 5. All other hyperparameters were left as default, except for eta, which was drawn uniformly from [0.01, 0.3]. The hyperparameters nrounds {10, . . . , 200}, max depth {1, . . . , 5}, eta [0, 0.5], colsample bytree [0.5, 1] and subsample [0.5, 1] were tuned via random search with 5-fold cross-validation over 100 random search evaluations. We ran randomized grid search with 5-fold CV to tune XGBoost hyperparameters (max depth {3, . . . , 7}, eta {0.01, 0.05, 0.1}, nrounds {100, 200, 300}, min child weight {1, 3, 5}, subsample {0.6, 0.8, 1.0}, colsample bytree {0.6, 0.8, 1.0}) over 25 trials. We then visualized the age relationship interaction component (Figure 6) and noticed that Fast PD and Friedman-path offered conflicting interpretations: With Fast PD, we notice that at prime working age there is a slight positive effect on income for husbands, while the effect is close to zero and/or slightly negative for wives. In contrast, the path-dependent algorithm Friedman-path estimates the effect to be zero for both wives and husbands at working age; suggesting that age has the same effect on a husband s and wife s income in this range.