Discovering Influential Neuron Path in Vision Transformers

Authors: Yifan Wang, Yifei Liu, Yingdong Shi, Changming Li, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions. Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category. We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning. The project website including implementation code is available at https://foundation-model-research.github.io/Neuron Path/. ... We have conducted several quantitative and qualitative experiments on the found neuron path, illustrating the significant role it plays and the advantage of our solution discovering and explaining the critical part of vision Transformer models.
Researcher Affiliation Collaboration 1Shanghai Tech University, 2Tencent PCG EMAIL
Pseudocode Yes Algorithm 1 Layer-progressive Neuron Locating Algorithm ... Algorithm 2 Greedy Search-Based Influence Pattern Algorithm
Open Source Code Yes The project website including implementation code is available at https://foundation-model-research.github.io/Neuron Path/.
Open Datasets Yes By applying the method to two types of vision Transformer models using different pretrain paradigms, supervised Vi T (Vi T-B-16) (Dosovitskiy et al., 2021) and self-supervised Masked Auto Encoder (MAE-B-16) (He et al., 2022), we have derived an unex- Published as a conference paper at ICLR 2025 pected discovery, as illustrated in Figure 2. Despite both models being pretrained and finetuned on the same dataset (Image Net (Deng et al., 2009)) with almost identical model structures
Dataset Splits Yes For a selected model, we retain the top t {1, 5, 10, 30, 50} most influential neurons per layer within the neuron paths discovered by our Neuron Path method. Following the statistical procedure outlined in Section 4.3, we identified neuron paths for each category using the 80% of the image data, and transfer the results and conduct the pruning experiment on the rest 20% image data, establishing a generalization setting.
Hardware Specification Yes All the experiments are run on NVIDIA A40 GPUs with batch size equals to 10 and sampling step m equals to 20, using Image Net1k validation set.
Software Dependencies No The paper does not explicitly mention specific software libraries or frameworks with version numbers for reproducibility.
Experiment Setup Yes All the experiments are run on NVIDIA A40 GPUs with batch size equals to 10 and sampling step m equals to 20, using Image Net1k validation set. ... For our following experiments, we will mainly utilize 3 Vi T settings and 1 MAE setting: Vi T-B-16, Vi T-B-32, Vi T-L-32 and MAE-B-16 as the target models. Details of these models are in Appendix C.1. ... As for the calculation of JAS, we set the sampling step m = 20 in Eq. (3) for the following experiments.