reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

Authors: Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We probe CLIPTe X and other pre-trained models on different downstream tasks and multiple datasets using classifier or regressor probes. This helps us understand if training with hard pseudo-labels from experts can improve the effectiveness of CLIP s image representations across different vision tasks. Experiments with multiple probes on variety of vision tasks and datasets (e.g., segmentation on PASCAL VOC and ADE20k, detection on COCO, depth estimation on NYU-v2, classification on Image Net-1k and Places-365, and surface normal estimation on NYU-v2) demonstrate the effectiveness of CLIPTe X.
Researcher Affiliation	Collaboration	Mohammadreza Salehi University of Washington; Mehrdad Farajtabar Maxwell Horton Fartash Faghri Hadi Pouransari Raviteja Vemulapalli Oncel Tuzel Apple; Ali Farhadi Allen Institute for Artificial Intelligence; Mohammad Rastegari Sachin Mehta Apple
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. The methodology is described in natural language and mathematical formulas.
Open Source Code	No	The paper does not contain any explicit statement about providing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We finetune pre-trained CLIP with and without pseudo-labels on CC3M (Sharma et al., 2018) for 30 epochs on 64 A100 GPUs. Semantic segmentation. We use PASCAL VOC (Everingham et al., 2010) with 20 classes and ADE20K (Zhou et al., 2019) with 150 classes for the task of semantic segmentation. Object detection and instance segmentation. We use the COCO dataset for detection and instance segmentation. Monocular depth estimation. We use NYU-V2 (Silberman et al., 2012) dataset as our depth estimation benchmark. Image classification. We evaluate on two standard image classification datasets, i.e., Image Net (Russakovsky et al., 2015) and Places365 (Zhou et al., 2017). retrieval on Flickr-30k (Young et al., 2014)
Dataset Splits	Yes	Following a standard convention, we report the accuracy on the validation sets of these datasets in terms of mean intersection over union (m Io U). Following standard convention, we evaluate the accuracy on COCO s validation set in terms of mean average precision (m AP). We use absolute relative error as a metric for evaluation on the validation set. evaluate on the official test set of NYU-V2. We use top-1 accuracy on the validation set as an evaluation metric.
Hardware Specification	Yes	Therefore, to show the efficacy of our approach, we finetune pre-trained CLIP with and without pseudo-labels on CC3M (Sharma et al., 2018) for 30 epochs on 64 A100 GPUs.
Software Dependencies	No	The paper mentions various models and architectures (e.g., Mask-RCNN, DPT, NLL-Ang MF, Deep Labv3, PSPNet, SSD) but does not specify any software dependencies with version numbers.
Experiment Setup	Yes	A Hyperparameters. Hyper-parameters used during training and probing CLIPTe X and other models are given in Table 8 and Table 9 respectively. For selecting λclip and λtask (where task = {depth, seg, surface normal} in Eq. (1), linear search was used. The value of λclip and λtask was chosen as 0.1, 1.0, and 10 for each of the tasks. We found that λclip = λtask = 1.0 worked well for all tasks except segmentation where we found λseg = 0.1 delivered the best or close to the best performance. So, we set these hyper-parameters in our experiments.