Explaining by Removing: A Unified Framework for Model Explanation
Authors: Ian Covert, Scott Lundberg, Su-In Lee
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have thus far analyzed removal-based explanations from a primarily theoretical standpoint, so we now conduct experiments to provide a complementary empirical perspective. Our experiments aim to accomplish three goals: 1. Implement and compare many new methods by filling out the space of removal-based explanations (Figure 2). 2. Demonstrate the advantages of removing features by marginalizing them out using their conditional distribution an approach that we showed yields information-theoretic explanations (Section 8). 3. Verify the existence of relationships between various explanation methods. Specifically, explanations may be similar if they use (i) summary techniques that are probabilistic values of the same cooperative game (Section 7), or (ii) feature removal strategies that are approximately equivalent (Section 8.2). |
| Researcher Affiliation | Collaboration | Ian C. Covert EMAIL Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, WA 98195, USA Scott Lundberg EMAIL Microsoft Research Microsoft Corporation Redmond, WA 98052, USA Su-In Lee EMAIL Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, WA 98195, USA |
| Pseudocode | No | The paper describes methods and concepts through definitions and mathematical formulations, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | Our implementation is available online,9 and we tested 80 total methods (68 of which are new) that span our framework as follows: 9. https://github.com/iancovert/removal-explanations |
| Open Datasets | Yes | The census income dataset provides basic demographic information about individuals, and the task is to predict whether a person s annual income exceeds $50k. ... (Lichman et al., 2013). For the MNIST digit recognition dataset, we trained a 14-layer CNN... (Le Cun et al., 2010). In our final experiment, we analyzed gene microarray data from The Cancer Genome Atlas (TCGA)12 for breast cancer (BRCA) patients whose tumors were categorized into different molecular subtypes (Berger et al., 2018). 12. https://www.cancer.gov/tcga |
| Dataset Splits | No | The paper mentions training models and using a validation set for hyperparameter selection, but it does not specify explicit percentages or sample counts for training, validation, or test splits for any of the datasets. For instance, for the BRCA dataset, it states: "Due to the small dataset size (only 510 patients), we prevented overfitting by analyzing a random subset of 100 genes (details in Appendix G) and training a regularized logistic regression model." |
| Hardware Specification | No | The paper discusses various models like Light GBM and CNNs and their training, but it does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments. |
| Software Dependencies | No | The paper mentions software like "Light GBM" and optimizers like "Adam", but it does not provide specific version numbers for these or any other libraries, frameworks (e.g., Python, PyTorch, TensorFlow), or system software. |
| Experiment Setup | Yes | For the census income dataset, we trained a Light GBM model with a maximum of 10 leaves per tree and a learning rate of 0.05 (Ke et al., 2017). For MNIST, we trained a 14-layer CNN consisting of convolutional layers with kernel size 3, max pooling layers, and ELU activations (Clevert et al., 2015). ... We trained the model with Adam using a learning rate of 10 3 (Kingma and Ba, 2014). For the BRCA dataset, we trained a ℓ1 regularized logistic regression model and selected the regularization parameter using a validation set. Our surrogate models were trained as follows: For the census income data, the surrogate was a MLP with a masking layer and four hidden layers of size 128 followed by ELU activations. ... For MNIST, the surrogate was a CNN with an identical architecture to the original model (see above) except for a masking layer at the input. ... For the BRCA data, the surrogate was an MLP with two hidden layers of size 64 followed by ELU activations. |