Metric-Driven Attributions for Vision Transformers
Authors: Chase Walker, Sumit Jha, Rickard Ewetz
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation demonstrates the proposed MDA method outperforms 7 existing Vi T attribution methods by an average of 12% across 12 attribution metrics on the Image Net dataset for the Vi T-base 16 16, Vi T-tiny 16 16, and Vi T-base 32 32 models. |
| Researcher Affiliation | Academia | Chase Walker1, Sumit Kumar Jha2, Rickard Ewetz1 1 University of Florida 2 Florida International University |
| Pseudocode | No | The paper describes its methodology in Section 3 using prose and mathematical equations (e.g., Eq. 1, 2, 3, 4, 5, 8, 10, 11, 12) and illustrations (Figure 2, 3, 4) but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is publicly available at https://github.com/chasewalker26/ MDA-Metric-Driven-Attributions-for-Vi T. |
| Open Datasets | Yes | We perform all experiments using Py Torch (Paszke et al., 2019) and use the Image Net 2012 validation dataset (Russakovsky et al., 2015) and the Image Net Segmentation dataset (Guillaumin et al., 2014). |
| Dataset Splits | Yes | We perform all experiments using Py Torch (Paszke et al., 2019) and use the Image Net 2012 validation dataset (Russakovsky et al., 2015) and the Image Net Segmentation dataset (Guillaumin et al., 2014). The results in Table 1 compare all 8 attribution methods over 5000 Image Net images with 5 images per class for the perturbation metrics. |
| Hardware Specification | Yes | The experiments were run on one server with four NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch ('We perform all experiments using Py Torch (Paszke et al., 2019)') but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We employ three Vi T models: Vi T-base 16 16, Vi T-tiny 16 16, and Vi T-base 32 32 as defined in the Vi T paper (Dosovitskiy et al., 2020). For these tests, all input images are 224 224px and we use a step size of 224px, for a total of 224 perturbation steps as in the original implementations. In practice, we set the parameter τ to 0.90. In practice, we employ κ = 0.005 to only strongly attribute patches with more than 0.5% model importance. A user can tune γ to their choosing, but, quantitatively, the best explanation is created with γ = 0. |