ProtoAttend: Attention-Based Prototypical Learning
Authors: Sercan O. Arik, Tomas Pfister
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the results of Proto Attend for image, text and tabular data classification problems with different encoder architectures (see Supplementary Material for details). Proto Attend yields superior results in three high impact problems without sacrificing accuracy of the original model: (1) it enables high-quality interpretability that outputs samples most relevant to the decision-making (i.e. a samplebased interpretability method); (2) it achieves state of the art confidence estimation by quantifying the mismatch across prototype labels; and (3) it obtains state of the art in distribution mismatch detection. |
| Researcher Affiliation | Industry | Sercan O. Arık EMAIL Google Cloud AI Sunnyvale, CA Tomas Pfister EMAIL Google Cloud AI Sunnyvale, CA |
| Pseudocode | Yes | Appendix A. Pseudo code for training Algorithm 1 Pseudo-code of Proto Attend training |
| Open Source Code | No | No explicit statement about open-source code release or a link to a code repository is provided in the paper. |
| Open Datasets | Yes | We demonstrate the results of Proto Attend for image, text and tabular data classification problems with different encoder architectures (see Supplementary Material for details). ... For image encoding, unless specified, we use the standard Res Net model (He et al., 2016). ... Table 2: ...Fashion-MNIST... Table 4: ...for CIFAR-10. Figure 5: Example inputs and Proto Attend prototypes for DBPedia... Figure 6: Example inputs and Proto Attend prototypes for Adult Census Income... |
| Dataset Splits | Yes | We construct the training and validation data set using 15122 images (13511 benign and 1611 melanoma cases), and the evaluation data set using 3203 images (2867 benign and 336 melanoma). While training, benign cases are undersampled in each batch to have 0.6 ratio including candidate database sets at training and inference. |
| Hardware Specification | No | Training database size is chosen to fit the model to the memory of a single GPU. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) are explicitly mentioned in the paper. |
| Experiment Setup | Yes | For the baseline encoder, the initial learning rate is chosen as 0.002 and exponential decay is applied with a rate of 0.9 applied every 6k iterations. The model is trained for 84k iterations. ... All models use a batch size of 128 and gradient clipping above 20. |