Faith-Shap: The Faithful Shapley Interaction Index
Authors: Che-Ping Tsai, Chih-Kuan Yeh, Pradeep Ravikumar
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first provide some experiments validating the relative computational efficiency of computing our Faith-Interaction indices, followed by quantitative and qualitative demonstrations of their use as explanations of ML models over a language dataset. The language dataset we use throughout the experiment is the simplified IMDB (Maas et al., 2011) dataset, where the model only uses the first two sentences of movie reviews as input, and predicts the probability of the reviews being positive. The model being explained is a BERT language model (Devlin et al., 2018) with 0.82 accuracy on the test set. |
| Researcher Affiliation | Academia | Che-Ping Tsai EMAIL Department of Machine Learning Carnegie Mellon University PA 15213, USA Chih-Kuan Yeh EMAIL Department of Machine Learning Carnegie Mellon University PA 15213, USA Pradeep Ravikumar EMAIL Department of Machine Learning Carnegie Mellon University PA 15213, USA |
| Pseudocode | Yes | Algorithm 1: Permutation-based sampling algorithm for the top-order Shapley Taylor index input : a value function v : 2d 7 R, maximum order ℓ. begin sum[S] 0 for all sets S [d] with size ℓ. count[S] 0 for all sets S [d] with size ℓ. for t = 1, 2, ... do π {i1, , id} be a random ordering of {1, 2, , d}. for all set S [d] with size ℓdo ik the leftmost element of S in the ordering π. T {i1, , ik 1} the set of predecessors of ik in π. sum[S] sum[S] + S(v(T)). count[S] = count[S] + 1. end end indices[S] sum[S]/count[S] for all sets S [d] with size ℓ. return indices end |
| Open Source Code | No | The paper does not provide a concrete statement about code release for the methodology described, nor does it include a direct link to a code repository. |
| Open Datasets | Yes | The language dataset we use throughout the experiment is the simplified IMDB (Maas et al., 2011) dataset... Portuguese marketing dataset (Moro et al., 2014): this is a tabular dataset with d = 17 features. |
| Dataset Splits | Yes | We used 25,000 reviews for training and 25,000 reviews for evaluation. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) were found. |
| Software Dependencies | No | No specific software dependencies with version numbers were found in the paper. The paper mentions models like 'BERT language model' and 'xgboost model' and methods like 'Lasso' but without associated software versions. |
| Experiment Setup | Yes | For the Faith Shap interaction index, We use Eqn.(20) and solve the corresponding linear regression problem with ℓ1 regularization, and regularization parameter α = 10 3 and α = 10 6 for the simplified IMDB dataset and the bank dataset. We used 4000 samples to estimate both Faithful Shapley Interaction indices and Shapley Taylor indices. We use Lasso with regularization parameter α = 0.001 to estimate Faithful Shapley Interaction indices and permutation-based sampling method to estimate the highest order Shapley Taylor indices (ℓ= 2). |