Neuron-based explanations of neural networks sacrifice completeness and interpretability
Authors: Nolan Simran Dey, Eric Taylor, Alexander Wong, Bryan P. Tripp, Graham W. Taylor
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By examining two quantitative measures of completeness and conducting a user study to measure interpretability, we show the most important principal components provide more complete and interpretable explanations than the most important neurons. Much of the activation variance may be explained by examining relatively few high-variance PCs, as opposed to studying every neuron. These principal components also strongly affect network function, and are significantly more interpretable than neurons. |
| Researcher Affiliation | Collaboration | Nolan Dey EMAIL Cerebras Systems, University of Waterloo, Vector InstituteEric Taylor EMAIL Borealis AI, Vector InstituteAlexander Wong EMAIL University of Waterloo, AppleBryan Tripp EMAIL University of WaterlooGraham W. Taylor EMAIL University of Guelph, Vector Institute |
| Pseudocode | No | The paper describes its methodology in text and figures (Figure 1 provides an overview), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Interactive demo and code available at https://ndey96.github.io/neuron-explanations-sacrifice. |
| Open Datasets | Yes | Our method can be applied to feed-forward networks as well as CNNs and we expect that it can be adapted to other architectures including Vision Transformers Dosovitskiy et al. (2021). While our methods are widely applicable, we focus this paper on studying Alex Net (Krizhevsky, 2014) pretrained on Image Net (Deng et al., 2009; Paszke et al., 2019) because it is studied in several related explainability works (Bau et al., 2017), (Fong & Vedaldi, 2018), (Mu & Andreas, 2020), (Zhou et al., 2018), (Rajpal et al., 2023) and its limited depth makes it feasible for us to study each layer in detail in a user study. |
| Dataset Splits | Yes | We performed PCA on a large representative sample of activations obtained by forward propagating every image in the Image Net training set through a DNN up to a specified layer. |
| Hardware Specification | Yes | Our computations were performed on machines with an 8 core Intel Xeon CPU, NVIDIA T4 GPU, and 64 GB of RAM. |
| Software Dependencies | No | The paper mentions software like scikit-learn (Pedregosa et al., 2011), PyTorch (Paszke et al., 2019), and Faiss library (Douze et al., 2024), but does not specify their version numbers. |
| Experiment Setup | No | The paper focuses on analyzing pre-trained Alex Net and does not provide specific hyperparameters or system-level training settings for the neural network itself. It describes the setup for activation analysis and visualization, such as sampling points along basis vectors and finding nearest neighbors. |