reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ProtoAttend: Attention-Based Prototypical Learning

Authors: Sercan O. Arik, Tomas Pfister

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the results of Proto Attend for image, text and tabular data classiﬁcation problems with diﬀerent encoder architectures (see Supplementary Material for details). Proto Attend yields superior results in three high impact problems without sacriﬁcing accuracy of the original model: (1) it enables high-quality interpretability that outputs samples most relevant to the decision-making (i.e. a samplebased interpretability method); (2) it achieves state of the art conﬁdence estimation by quantifying the mismatch across prototype labels; and (3) it obtains state of the art in distribution mismatch detection.
Researcher Affiliation	Industry	Sercan O. Arık EMAIL Google Cloud AI Sunnyvale, CA Tomas Pﬁster EMAIL Google Cloud AI Sunnyvale, CA
Pseudocode	Yes	Appendix A. Pseudo code for training Algorithm 1 Pseudo-code of Proto Attend training
Open Source Code	No	No explicit statement about open-source code release or a link to a code repository is provided in the paper.
Open Datasets	Yes	We demonstrate the results of Proto Attend for image, text and tabular data classiﬁcation problems with diﬀerent encoder architectures (see Supplementary Material for details). ... For image encoding, unless speciﬁed, we use the standard Res Net model (He et al., 2016). ... Table 2: ...Fashion-MNIST... Table 4: ...for CIFAR-10. Figure 5: Example inputs and Proto Attend prototypes for DBPedia... Figure 6: Example inputs and Proto Attend prototypes for Adult Census Income...
Dataset Splits	Yes	We construct the training and validation data set using 15122 images (13511 benign and 1611 melanoma cases), and the evaluation data set using 3203 images (2867 benign and 336 melanoma). While training, benign cases are undersampled in each batch to have 0.6 ratio including candidate database sets at training and inference.
Hardware Specification	No	Training database size is chosen to ﬁt the model to the memory of a single GPU.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) are explicitly mentioned in the paper.
Experiment Setup	Yes	For the baseline encoder, the initial learning rate is chosen as 0.002 and exponential decay is applied with a rate of 0.9 applied every 6k iterations. The model is trained for 84k iterations. ... All models use a batch size of 128 and gradient clipping above 20.