reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sampling Permutations for Shapley Value Estimation

Authors: Rory Mitchell, Joshua Cooper, Eibe Frank, Geoffrey Holmes

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of permutation sampling strategies on tabular data, image data, and in terms of data-independent discrepancy scores. The experimental evaluation proceeds as follows: Section 5.1 ﬁrst evaluates existing algorithms on tabular data using GBDT models, reporting exact error scores. MC-Antithetic emerges as the clear winner, so we use this as a baseline in subsequent experiments against newly proposed algorithms.
Researcher Affiliation	Collaboration	Rory Mitchell EMAIL Nvidia Corporation Santa Clara CA 95051, USA; Joshua Cooper EMAIL Department of Mathematics University of South Carolina 1523 Greene St. Columbia, SC 29223, USA; Eibe Frank EMAIL Department of Computer Science University of Waikato Hamilton, New Zealand
Pseudocode	Yes	Algorithm 1: Sequential Bayesian Quadrature; Algorithm 2: Sample permutation from Sd 2; Algorithm 3: Sample k = 2(d 1) permutations from Sd 2; Algorithm 4: Sobol Permutations
Open Source Code	No	The text does not contain any explicit statement from the authors about releasing their own source code, nor does it provide a direct link to a code repository for the methodology described in the paper. It only mentions the use of third-party libraries like XGBoost, SHAP, and scikit-learn.
Open Datasets	Yes	Table 2: Tabular datasets: adult, breast cancer, bank, cal housing, make regression, year, all cited with authors and years. For example, 'adult Kohavi (1996)'. Also, 'Image Net 2012 dataset of Russakovsky et al. (2015)' is mentioned.
Dataset Splits	No	Models are trained using the entire dataset (no test/train split) using the default parameters of the XGBoost library (100 boosting iterations, maximum depth 6, learning rate 0.3, mean squared error objective for regression, and binary logistic objective for classiﬁcation). As for the GBDT models, we use the entire dataset for training.
Hardware Specification	Yes	Runtime (in seconds) is also reported, where permutation sets are generated using a single thread of a Xeon E5-2698 CPU. Generating Shapley values for an image using 100 permutation samples and 256 features requires 100 (256 + 1) = 25700 model evaluations, taking around 40s on an Nvidia V100 GPU.
Software Dependencies	No	The paper mentions software libraries such as 'XGBoost library', 'SHAP software package', and 'scikit-learn library', but does not provide specific version numbers for any of these dependencies, which is required for reproducibility.
Experiment Setup	Yes	Models are trained using the entire dataset (no test/train split) using the default parameters of the XGBoost library (100 boosting iterations, maximum depth 6, learning rate 0.3, mean squared error objective for regression, and binary logistic objective for classiﬁcation). The model is trained using the scikit-learn library (Pedregosa et al., 2011) with default parameters: a single hidden layer of 100 neurons, a relu activation function, and trained with the adam optimiser (Kingma and Ba, 2014) for 200 iterations with an initial learning rate of 0.001.