reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SHAP-XRT: The Shapley Value Meets Conditional Independence Testing

Authors: Jacopo Teneggi, Beepul Bharti, Yaniv Romano, Jeremias Sulam

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our results with simulated as well as real imaging data. We now present three experiments, of increasing complexity, that showcase how the SHAP-XRT procedure can be used in practice to explain machine learning predictions, contextualizing the Shapley value from a statistical viewpoint.
Researcher Affiliation	Academia	Jacopo Teneggi EMAIL Department of Computer Science, Johns Hopkins University Mathematical Institute for Data Science (MINDS), Johns Hopkins University Beepul Bharti EMAIL Department of Biomedical Engineering, Johns Hopkins University Mathematical Institute for Data Science (MINDS), Johns Hopkins University Yaniv Romano EMAIL Departments of Electrical Engineering and of Computer Science, Technion Israel Institute of Technology Jeremias Sulam EMAIL Department of Biomedical Engineering, Johns Hopkins University Mathematical Institute for Data Science (MINDS), Johns Hopkins University
Pseudocode	Yes	Algorithm 1 Shapley Explanation Randomization Test (SHAP-XRT) procedure SHAP-XRT(model f : Rn [0, 1], sample x Rn, feature j [n], subset C [n] \ {j}, test statistic T, number of null draws K N, number of reference samples L N)
Open Source Code	Yes	All code to reproduce experiments will be made publicly available.
Open Datasets	Yes	Finally, we revisit an experiment from Teneggi et al. (2022a) on the BBBC041 dataset (Ljosa et al., 2012), which comprises 1425 images of healthy and infected human blood smears of size 1200 1600 pixels. (which is publicly available at https://bbbc.broadinstitute.org/BBBC041).
Dataset Splits	Yes	We split the original dataset into a training and validation split using an 80/20 ratio, respectively. This way, we train our model on 589 positive and 608 negative images, and validate on 112 positive and 116 negative images.
Hardware Specification	Yes	All experiments were run on an NVIDIA Quadro RTX 5000 with 16 GB of RAM memory on a private server with 96 CPU cores.
Software Dependencies	Yes	All scripts were run on Py Torch 1.11.0, Python 3.8.13, and CUDA 10.2.
Experiment Setup	Yes	We train both models for one epoch on m i.i.d. samples and a batch size of 64. We note that we use Adam (Kingma & Ba, 2014) with learning rate of 0.001, and SGD with learning rate of 0.01 for f CNN and f FCN, respectively, to achieve optimal validation accuracy. We optimize all parameters of the network for 25 epochs using binary-cross entropy loss and Adam optimizer, with a learning rate of 0.0001 and learning rate decay of 0.2 every 10 epochs.