Prototypical Self-Explainable Models Without Re-training
Authors: Srishti Gautam, Ahcene Boubekki, Marina MC Höhne, Michael Kampffmeyer
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs. 4 Evaluations 4.1 Datasets, implementation and baselines 4.2 Traditional evaluation of KMEx 4.3 Quantitative evaluation of SEMs |
| Researcher Affiliation | Academia | Srishti Gautam EMAIL Department of Physics and Technology Ui T The Arctic University of Norway, Norway Ahcene Boubekki EMAIL Machine Learning and Uncertainty Physikalisch-Technische Bundesanstalt, Germany Marina M. C. Höhne EMAIL Data Science in Bioeconomy, Leibniz Institute for Agriculture and Bioeconomy Institute for Computer Science, University of Potsdam, Germany Michael C. Kampffmeyer EMAIL Department of Physics and Technology Ui T The Arctic University of Norway, Norway |
| Pseudocode | No | The paper describes methods and procedures in narrative text, often listing steps (e.g., in Section 3.1) but does not present them in a formally structured pseudocode block or an algorithm environment. |
| Open Source Code | Yes | 1The code is available at https://github.com/Srishti Gautam/KMEx |
| Open Datasets | Yes | We evaluate all methods on 7 datasets, MNIST (Lecun et al., 1998), Fashion MNIST (Xiao et al., 2017) (f MNIST), SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky, 2009), STL-10 (Coates et al., 2011), a subset of Quick Draw (Parekh et al., 2021) and binary classification for male and female for the Celeb A dataset (Liu et al., 2015). All datasets used in this work are open-source. |
| Dataset Splits | Yes | For all datasets, we use the official training and testing splits, except for Quick Draw (Ha & Eck, 2018) for which we use a subset of 10 classes that was created by (Parekh et al., 2021). This subset consists of the following 10 classes: Ant, Apple, Banana, Carrot, Cat, Cow, Dog, Frog, Grapes, Lion. Each of the classes contains 1000 images of size 28 28 out of which 80% are used for training and the remaining 20% for testing. The MNIST (Lecun et al., 1998), f MNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009) datasets consist of 60,000 training images and 10,000 test images of size 28 28, 28 28 and 32 32, respectively. ... The number of training and testing images for Celeb A are 162,770 and 19,962, respectively, of size 224 224. |
| Hardware Specification | Yes | The experiments in this work were conducted on an NVIDIA A100 GPU. These experiments are conducted using Py Torch with a more accessible NVIDIA Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Zennit package' but does not provide specific version numbers for these software components, which is required for a reproducible description. |
| Experiment Setup | Yes | A.2 Implementation details The backbone network used for all models as well as all datasets consists of an Image Net (Deng et al., 2009) pretrained Res Net34 (He et al., 2016). Table 10: Hyperparameter values for KMEx, Proto PNet, FLINT and Proto VAE for all the datasets. (This table provides values for 'No. of prototypes per class', 'No. of epochs', 'Learning rate', and various 'Loss weights' for each model and dataset.) |