reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explaining by Removing: A Unified Framework for Model Explanation

Authors: Ian Covert, Scott Lundberg, Su-In Lee

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have thus far analyzed removal-based explanations from a primarily theoretical standpoint, so we now conduct experiments to provide a complementary empirical perspective. Our experiments aim to accomplish three goals: 1. Implement and compare many new methods by ﬁlling out the space of removal-based explanations (Figure 2). 2. Demonstrate the advantages of removing features by marginalizing them out using their conditional distribution an approach that we showed yields information-theoretic explanations (Section 8). 3. Verify the existence of relationships between various explanation methods. Speciﬁcally, explanations may be similar if they use (i) summary techniques that are probabilistic values of the same cooperative game (Section 7), or (ii) feature removal strategies that are approximately equivalent (Section 8.2).
Researcher Affiliation	Collaboration	Ian C. Covert EMAIL Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, WA 98195, USA Scott Lundberg EMAIL Microsoft Research Microsoft Corporation Redmond, WA 98052, USA Su-In Lee EMAIL Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, WA 98195, USA
Pseudocode	No	The paper describes methods and concepts through definitions and mathematical formulations, but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Our implementation is available online,9 and we tested 80 total methods (68 of which are new) that span our framework as follows: 9. https://github.com/iancovert/removal-explanations
Open Datasets	Yes	The census income dataset provides basic demographic information about individuals, and the task is to predict whether a person s annual income exceeds $50k. ... (Lichman et al., 2013). For the MNIST digit recognition dataset, we trained a 14-layer CNN... (Le Cun et al., 2010). In our ﬁnal experiment, we analyzed gene microarray data from The Cancer Genome Atlas (TCGA)12 for breast cancer (BRCA) patients whose tumors were categorized into diﬀerent molecular subtypes (Berger et al., 2018). 12. https://www.cancer.gov/tcga
Dataset Splits	No	The paper mentions training models and using a validation set for hyperparameter selection, but it does not specify explicit percentages or sample counts for training, validation, or test splits for any of the datasets. For instance, for the BRCA dataset, it states: "Due to the small dataset size (only 510 patients), we prevented overﬁtting by analyzing a random subset of 100 genes (details in Appendix G) and training a regularized logistic regression model."
Hardware Specification	No	The paper discusses various models like Light GBM and CNNs and their training, but it does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments.
Software Dependencies	No	The paper mentions software like "Light GBM" and optimizers like "Adam", but it does not provide specific version numbers for these or any other libraries, frameworks (e.g., Python, PyTorch, TensorFlow), or system software.
Experiment Setup	Yes	For the census income dataset, we trained a Light GBM model with a maximum of 10 leaves per tree and a learning rate of 0.05 (Ke et al., 2017). For MNIST, we trained a 14-layer CNN consisting of convolutional layers with kernel size 3, max pooling layers, and ELU activations (Clevert et al., 2015). ... We trained the model with Adam using a learning rate of 10 3 (Kingma and Ba, 2014). For the BRCA dataset, we trained a ℓ1 regularized logistic regression model and selected the regularization parameter using a validation set. Our surrogate models were trained as follows: For the census income data, the surrogate was a MLP with a masking layer and four hidden layers of size 128 followed by ELU activations. ... For MNIST, the surrogate was a CNN with an identical architecture to the original model (see above) except for a masking layer at the input. ... For the BRCA data, the surrogate was an MLP with two hidden layers of size 64 followed by ELU activations.