reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Faith-Shap: The Faithful Shapley Interaction Index

Authors: Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first provide some experiments validating the relative computational efficiency of computing our Faith-Interaction indices, followed by quantitative and qualitative demonstrations of their use as explanations of ML models over a language dataset. The language dataset we use throughout the experiment is the simplified IMDB (Maas et al., 2011) dataset, where the model only uses the first two sentences of movie reviews as input, and predicts the probability of the reviews being positive. The model being explained is a BERT language model (Devlin et al., 2018) with 0.82 accuracy on the test set.
Researcher Affiliation	Academia	Che-Ping Tsai EMAIL Department of Machine Learning Carnegie Mellon University PA 15213, USA Chih-Kuan Yeh EMAIL Department of Machine Learning Carnegie Mellon University PA 15213, USA Pradeep Ravikumar EMAIL Department of Machine Learning Carnegie Mellon University PA 15213, USA
Pseudocode	Yes	Algorithm 1: Permutation-based sampling algorithm for the top-order Shapley Taylor index input : a value function v : 2d 7 R, maximum order ℓ. begin sum[S] 0 for all sets S [d] with size ℓ. count[S] 0 for all sets S [d] with size ℓ. for t = 1, 2, ... do π {i1, , id} be a random ordering of {1, 2, , d}. for all set S [d] with size ℓdo ik the leftmost element of S in the ordering π. T {i1, , ik 1} the set of predecessors of ik in π. sum[S] sum[S] + S(v(T)). count[S] = count[S] + 1. end end indices[S] sum[S]/count[S] for all sets S [d] with size ℓ. return indices end
Open Source Code	No	The paper does not provide a concrete statement about code release for the methodology described, nor does it include a direct link to a code repository.
Open Datasets	Yes	The language dataset we use throughout the experiment is the simplified IMDB (Maas et al., 2011) dataset... Portuguese marketing dataset (Moro et al., 2014): this is a tabular dataset with d = 17 features.
Dataset Splits	Yes	We used 25,000 reviews for training and 25,000 reviews for evaluation.
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) were found.
Software Dependencies	No	No specific software dependencies with version numbers were found in the paper. The paper mentions models like 'BERT language model' and 'xgboost model' and methods like 'Lasso' but without associated software versions.
Experiment Setup	Yes	For the Faith Shap interaction index, We use Eqn.(20) and solve the corresponding linear regression problem with ℓ1 regularization, and regularization parameter α = 10 3 and α = 10 6 for the simplified IMDB dataset and the bank dataset. We used 4000 samples to estimate both Faithful Shapley Interaction indices and Shapley Taylor indices. We use Lasso with regularization parameter α = 0.001 to estimate Faithful Shapley Interaction indices and permutation-based sampling method to estimate the highest order Shapley Taylor indices (ℓ= 2).