SHAP-XRT: The Shapley Value Meets Conditional Independence Testing

Authors: Jacopo Teneggi, Beepul Bharti, Yaniv Romano, Jeremias Sulam

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our results with simulated as well as real imaging data. We now present three experiments, of increasing complexity, that showcase how the SHAP-XRT procedure can be used in practice to explain machine learning predictions, contextualizing the Shapley value from a statistical viewpoint.
Researcher Affiliation Academia Jacopo Teneggi EMAIL Department of Computer Science, Johns Hopkins University Mathematical Institute for Data Science (MINDS), Johns Hopkins University Beepul Bharti EMAIL Department of Biomedical Engineering, Johns Hopkins University Mathematical Institute for Data Science (MINDS), Johns Hopkins University Yaniv Romano EMAIL Departments of Electrical Engineering and of Computer Science, Technion Israel Institute of Technology Jeremias Sulam EMAIL Department of Biomedical Engineering, Johns Hopkins University Mathematical Institute for Data Science (MINDS), Johns Hopkins University
Pseudocode Yes Algorithm 1 Shapley Explanation Randomization Test (SHAP-XRT) procedure SHAP-XRT(model f : Rn [0, 1], sample x Rn, feature j [n], subset C [n] \ {j}, test statistic T, number of null draws K N, number of reference samples L N)
Open Source Code Yes All code to reproduce experiments will be made publicly available.
Open Datasets Yes Finally, we revisit an experiment from Teneggi et al. (2022a) on the BBBC041 dataset (Ljosa et al., 2012), which comprises 1425 images of healthy and infected human blood smears of size 1200 1600 pixels. (which is publicly available at https://bbbc.broadinstitute.org/BBBC041).
Dataset Splits Yes We split the original dataset into a training and validation split using an 80/20 ratio, respectively. This way, we train our model on 589 positive and 608 negative images, and validate on 112 positive and 116 negative images.
Hardware Specification Yes All experiments were run on an NVIDIA Quadro RTX 5000 with 16 GB of RAM memory on a private server with 96 CPU cores.
Software Dependencies Yes All scripts were run on Py Torch 1.11.0, Python 3.8.13, and CUDA 10.2.
Experiment Setup Yes We train both models for one epoch on m i.i.d. samples and a batch size of 64. We note that we use Adam (Kingma & Ba, 2014) with learning rate of 0.001, and SGD with learning rate of 0.01 for f CNN and f FCN, respectively, to achieve optimal validation accuracy. We optimize all parameters of the network for 25 epochs using binary-cross entropy loss and Adam optimizer, with a learning rate of 0.0001 and learning rate decay of 0.2 every 10 epochs.