reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts

Authors: Jonathan Crabbé, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we bridge this gap by probing the representation spaces of 16 robust zero-shot CLIP vision encoders with various backbones (Res Nets and Vi Ts) and pretraining sets (Open AI, LAION-400M, LAION-2B, YFCC15M, CC12M and Data Comp), and comparing them to the representation spaces of less robust models with identical backbones, but different (pre)training sets or objectives (CLIP pretraining on Image Net-Captions, and supervised training or finetuning on Image Net). Through this analysis, we generate three novel insights.
Researcher Affiliation	Collaboration	Jonathan Crabbé EMAIL University of Cambridge (work done while at Apple) Pau Rodríguez EMAIL Apple Vaishaal Shankar EMAIL Apple Luca Zappella EMAIL Apple Arno Blaas EMAIL Apple
Pseudocode	No	The paper describes mathematical formulas for the contrastive loss and zero-shot classifier logits, but does not provide structured pseudocode or algorithm blocks. Methods are described within the regular text and equations.
Open Source Code	No	The paper mentions leveraging models from the Open CLIP repository (Ilharco et al., 2021) and checkpoints provided by Fang et al. (2022). However, it does not explicitly state that the authors are releasing their own code for the methodology described in this paper.
Open Datasets	Yes	The paper makes extensive use of well-known public datasets and cites them: "Image Net (Deng et al., 2009)", "Image Net-V2 (Recht et al., 2019)", "Image Net-R (Hendrycks et al., 2021a)", "Image Net-Sketch (Wang et al., 2019)", "Object Net (Barbu et al., 2019)", "Image Net-A (Hendrycks et al., 2021b)", "YFCC-15M (Thomee et al., 2016; Radford et al., 2021)", "CC-12M (Changpinyo et al., 2021)", "LAION-400M, LAION-2B (Schuhmann et al., 2022)", "Data Comp (Cherti et al., 2023)", and the "Broden dataset (Bau et al., 2017)".
Dataset Splits	Yes	We use the Image Net test set to produce activation vectors h(n) = fv(x(n)) Rd H for each image x(n) Rd X fed to the encoder. We report the average kurtosis over the Image Net test set... To obtain the finetuned CLIP models... finetune these models for 10 epochs on the Image Net training set... We train these modified Res Net models from scratch for 90 epochs on the Image Net training set...
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. It only generally refers to training models.
Software Dependencies	No	The paper mentions "Py Torch-Model-Compare package (Subramanian, 2021)" and "torchvision (Torch Vision maintainers and contributors, 2016)" but does not specify version numbers for these or other key software components like PyTorch or Python itself.
Experiment Setup	Yes	To obtain the finetuned CLIP models... we then finetune these models for 10 epochs on the Image Net training set, using a batch size of 256 and a learning rate of 3 10 5 with a cosine annealing learning rate scheduler and a warm-up of 500 steps. We use the Adam W optimizer and set the weight decay to 0.1. For the Supervised Image Net models... We train these modified Res Net models from scratch for 90 epochs on the Image Net training set, using a batch size of 1024. We use Adam W, and a learning rate schedule decaying from 10 3 to 10 4 after 30 epochs and to 10 5 after 60 epochs (with a warm-up period of 5,000 steps). We set weight decay to 10 2. We use the standard augmentations of horizontal flip with random crop as well as label smoothing.