Evaluating Neuron Explanations: A Unified Framework with Sanity Checks
Authors: Tuomas Oikarinen, Ge Yan, Tsui-Wei Weng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform two versions of these tests, Experimental on real neurons across diverse settings, and Theoretical on ideal neurons described below. ... we perform an additional comparison between evaluation metrics by empirically comparing how well they perform on neurons where we know their ground truth function |
| Researcher Affiliation | Academia | 1CSE, UC San Diego, CA, USA 2HDSI, UC San Diego, CA, USA. Correspondence to: Tuomas Oikarinen <EMAIL>, Tsui-Wei Weng <EMAIL>. |
| Pseudocode | No | The paper describes various mathematical formulations and evaluation methods but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and results are publicly available at https://github.com/Trustworthy-ML-Lab/NeuronEval. |
| Open Datasets | Yes | We evaluated vision models across 3 datasets: Imagenet, Places365 and CUB200, while language models were evaluated on a subset of Open Web Text(Gokaslan et al., 2019). ... The Image Net (Deng et al., 2009), Places (Zhou et al., 2017) and GPT-2 (Radford et al., 2019) models were pretrained. ... For CLIP, we used the pretrained model from (Radford et al., 2021), and then learned a linear probe on top of frozen image embeddings to minimize binary cross-entropy loss on the training split of CUB200(Wah et al., 2011) |
| Dataset Splits | Yes | For all experiments we split a random 5% of the neurons into validation set. For metrics that require hyperparameters such as α, we use the hyperparameters that performed the best in terms of Meta-AUPRC on the validation split for each setting. We then report performance on the remaining 95% of neurons. ... For CLIP, we used the pretrained model from (Radford et al., 2021), and then learned a linear probe on top of frozen image embeddings to minimize binary cross-entropy loss on the training split of CUB200(Wah et al., 2011), with early stopping using validation data. |
| Hardware Specification | No | The paper discusses various models (Vi T-B-16, ResNet-50, ResNet-18, GPT-2-small, GPT-2-XL) and datasets used in experiments but does not provide specific hardware details such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | For all experiments we split a random 5% of the neurons into validation set. For metrics that require hyperparameters such as α, we use the hyperparameters that performed the best in terms of Meta-AUPRC on the validation split for each setting. We then report performance on the remaining 95% of neurons. For all evaluations we used neuron activations after the activation function (i.e. softmax/sigmoid). ... For layer4(after avg pool) neurons we defined the correct concept tk as the concept that maximizes Io U with α = 0.005 similar to (Bau et al., 2017), using the class(and superclass) labels of the dataset as ct. For these layers we fixed α = 0.005 for all metrics as that was used to determine the ground truth. |