Interpreting the Second-Order Effects of Neurons in CLIP
Authors: Yossi Gandelsman, Alexei Efros, Jacob Steinhardt
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We start by analyzing the empirical behavior of second-order effects of neurons. We find that these effects have high significance in the late layers. Additionally, each neuron is highly selective: its second-order effect is significant for only a small set (about 2%) of the images. Finally, this effect can be approximated by one linear direction in the output space. These findings will help motivate our algorithm for describing output spaces of neurons with text in Section 4. We evaluate the performance on Image Net validation set. The classification accuracy results for the adversarial images are presented in Table 3. The success rate of our adversarial images is significantly higher than the indirect effect baseline, the similar words baseline, and the random baseline, which succeeds only accidentally. |
| Researcher Affiliation | Academia | Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt UC Berkeley EMAIL |
| Pseudocode | No | The paper describes methods using mathematical equations and prose. It does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Project page: https://yossigandelsman.github.io/clip_neurons/. This is a project page or high-level overview page, not a direct link to a source-code repository, and the paper does not contain an explicit statement about releasing code. |
| Open Datasets | Yes | To evaluate the second-order effects and their contributions to the output representation, we measure the downstream performance on the Image Net classification task (Deng et al., 2009). We generate adversarial images for classifying between pairs of classes from CIFAR-10 (Krizhevsky, 2009). We repeat the same experiments from Section 3.3 for Vi T-L-14, trained on LAION dataset (Schuhmann et al., 2022). |
| Dataset Splits | Yes | We take D to be 5000 images from the Image Net training set. We report zero-shot classification accuracy on the Image Net validation set. Our model is Open AI s Vi T-B-32 CLIP, which has 12 layers. We present additional results for Vi T-L-14 and for Image Net-R (Hendrycks et al., 2021) in Appendix A.1 and Figure 10. |
| Hardware Specification | Yes | All our experiments were run on one A100 GPU. |
| Software Dependencies | No | The paper mentions "scikit-learn's implementation of orthogonal matching pursuit (Pati et al., 1993)", "LLa MA3 (Touvron et al., 2023)", "Deep Floyd IF text-to-image model (Stability AI, 2023)", and "Chat GPT (GPT 3.5)". However, it does not provide specific version numbers for the scikit-learn library or other software components, only references to models or publications. |
| Experiment Setup | Yes | We take D to be 5000 images from the Image Net training set. Our model is Open AI s Vi T-B-32 CLIP, which has 12 layers. We experiment with m {4, 8, 16, 32, 64, 128} and the three text pools. We choose the top 100 neurons from layers 8-10 for N, and the top 25 words according to their contribution scores for prompting the LLM. We prompt LLa MA3 (Touvron et al., 2023) to generate 50 descriptions for each classification task (see prompt in Appendix A.7). We then filter out descriptions that include the class name and choose 10 random descriptions. We generate 10 images for each description with Deep Floyd IF text-to-image model (Stability AI, 2023). This results in 100 images per experiment. We repeat the experiment 3 times and manually remove images that include c2 objects or do not include c1 objects. Binarizing them yields a strong zero-shot image segmenter by applying a threshold of 0.5. |