reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpreting Neurons in Deep Vision Networks with Language Models

Authors: Nicholas Bai, Rahul Ajay Iyer, Tuomas Oikarinen, Akshay R. Kulkarni, Tsui-Wei Weng

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted extensive qualitative and quantitative analysis to show that Dn D outperforms prior work by providing higher quality neuron descriptions.
Researcher Affiliation	Academia	Nicholas Bai EMAIL UC San Diego Rahul A. Iyer EMAIL UT Austin Tuomas Oikarinen EMAIL UC San Diego Akshay Kulkarni EMAIL UC San Diego Tsui-Wei Weng EMAIL UC San Diego
Pseudocode	Yes	An overview of Describe-and-Dissect (Dn D) and these 3 steps are illustrated in Figure 2. ... The algorithm consists of 4 substeps.
Open Source Code	Yes	Our code and data are available at https://github.com/Trustworthy-ML-Lab/Describe-and-Dissect.
Open Datasets	Yes	Res Net-50 and Res Net-18 (He et al., 2016) trained on Image Net (Russakovsky et al., 2015) and Place365 (Zhou et al., 2016) respectively. ... We dissected both a Res Net-50 network pretrained on Imagenet-1K and Res Net-18 trained on Places365, using the union of Image Net validation dataset and Broden (Bau et al., 2017) as our probing dataset. ... Tile2Vec (Jean et al., 2019) utilizes a modified Res Net-18 backbone trained to minimize triplet loss between anchor, neighbor, and distant land tiles from the NAIP dataset (Claire Boryan & Craig, 2011). ... We also evaluate a Res Net-50 model trained on labeled Euro SAT images (Helber et al., 2019) with 10 land cover classes.
Dataset Splits	Yes	We use the union of the Image Net validation dataset and Broden as Dprobe and compare to Network Dissection (Bau et al., 2017), MILAN (Hernandez et al., 2022), and CLIP-dissect (Oikarinen & Weng, 2023) as baselines. ... To compare the performance, following Oikarinen & Weng (2023), we use our model to describe the final layer neurons of Res Net-50 (where we know their ground truth role) and compare description similarity to the class name that neuron is detecting, as discussed in Section 4.2.
Hardware Specification	Yes	One limitation of Describe-and-Dissect is the relatively high computational cost, taking on average about 38.8 seconds per neuron with a Tesla V100 GPU.
Software Dependencies	No	The first model is Bootstrapping Language-Image Pretraining (BLIP) (Li et al., 2022), which is an image-to-text model... The second model is GPT-3.5 Turbo, which is a transformer model developed by Open AI... The third model is Stable Diffusion (Rombach et al., 2022)...
Experiment Setup	Yes	The top K most highly activating images for a neuron n are collected in set I, \|I\| = K, by selecting K images xi Dprobe Dcropped with the largest g(Ak(xi)). ... For the purposes of our experiments, we generate N = 5 candidate concepts unless otherwise mentioned. ... For the purposes of the experiments in this paper, we set Q = 10. ... For our experiments, we use t = 10. In practice, Rj is computed as the square of the ranks in top β = 5 ranking images for better differentiation between scores, Rj = {(Ri j)2; i β}. ... For both models we evaluated 4 of the intermediate layers (end of each residual block), with 200 randomly chosen neurons per layer for Res Net50 and 50 per layer for Res Net-18.