reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Prototypical Self-Explainable Models Without Re-training

Authors: Srishti Gautam, Ahcene Boubekki, Marina MC Höhne, Michael Kampffmeyer

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs. 4 Evaluations 4.1 Datasets, implementation and baselines 4.2 Traditional evaluation of KMEx 4.3 Quantitative evaluation of SEMs
Researcher Affiliation	Academia	Srishti Gautam EMAIL Department of Physics and Technology Ui T The Arctic University of Norway, Norway Ahcene Boubekki EMAIL Machine Learning and Uncertainty Physikalisch-Technische Bundesanstalt, Germany Marina M. C. Höhne EMAIL Data Science in Bioeconomy, Leibniz Institute for Agriculture and Bioeconomy Institute for Computer Science, University of Potsdam, Germany Michael C. Kampffmeyer EMAIL Department of Physics and Technology Ui T The Arctic University of Norway, Norway
Pseudocode	No	The paper describes methods and procedures in narrative text, often listing steps (e.g., in Section 3.1) but does not present them in a formally structured pseudocode block or an algorithm environment.
Open Source Code	Yes	1The code is available at https://github.com/Srishti Gautam/KMEx
Open Datasets	Yes	We evaluate all methods on 7 datasets, MNIST (Lecun et al., 1998), Fashion MNIST (Xiao et al., 2017) (f MNIST), SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky, 2009), STL-10 (Coates et al., 2011), a subset of Quick Draw (Parekh et al., 2021) and binary classification for male and female for the Celeb A dataset (Liu et al., 2015). All datasets used in this work are open-source.
Dataset Splits	Yes	For all datasets, we use the official training and testing splits, except for Quick Draw (Ha & Eck, 2018) for which we use a subset of 10 classes that was created by (Parekh et al., 2021). This subset consists of the following 10 classes: Ant, Apple, Banana, Carrot, Cat, Cow, Dog, Frog, Grapes, Lion. Each of the classes contains 1000 images of size 28 28 out of which 80% are used for training and the remaining 20% for testing. The MNIST (Lecun et al., 1998), f MNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2009) datasets consist of 60,000 training images and 10,000 test images of size 28 28, 28 28 and 32 32, respectively. ... The number of training and testing images for Celeb A are 162,770 and 19,962, respectively, of size 224 224.
Hardware Specification	Yes	The experiments in this work were conducted on an NVIDIA A100 GPU. These experiments are conducted using Py Torch with a more accessible NVIDIA Ge Force GTX 1080 Ti GPU.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Zennit package' but does not provide specific version numbers for these software components, which is required for a reproducible description.
Experiment Setup	Yes	A.2 Implementation details The backbone network used for all models as well as all datasets consists of an Image Net (Deng et al., 2009) pretrained Res Net34 (He et al., 2016). Table 10: Hyperparameter values for KMEx, Proto PNet, FLINT and Proto VAE for all the datasets. (This table provides values for 'No. of prototypes per class', 'No. of epochs', 'Learning rate', and various 'Loss weights' for each model and dataset.)