reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Authors: Jan Hendrik Metzen, Piyapat Saranrittichai, Chaithanya Kumar Mummadi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that Auto CLIP outperforms baselines across a broad range of vision-language models, datasets, and prompt templates consistently and by up to 3 percent point accuracy. We evaluate Auto CLIP on a large number of datasets, vision-language models, and prompt templates (Section 4) as well as in a controlled setting (Section 5).
Researcher Affiliation	Industry	Jan Hendrik Metzen Jan EMAIL Bosch Center for Artificial Intelligence, Robert Bosch Gmb H Piyapat Saranrittichai EMAIL Bosch Center for Artificial Intelligence, Robert Bosch Gmb H Chaithanya Kumar Mummadi Chaithanya EMAIL Bosch Center for Artificial Intelligence, Robert Bosch LLC
Pseudocode	Yes	Algorithm 1 Zero-Shot Classifier for a single sample x... Algorithm 2 Auto CLIP: Auto-Tuned Zero-Shot Classifier for a single sample x
Open Source Code	Yes	We provide a basic implementation of Auto CLIP at https://github.com/boschresearch/autoclip. Code for reproducing the results of this section is available at https://github.com/boschresearch/autoclip.
Open Datasets	Yes	We conduct experiments on the datasets CUB200 (Welinder et al., 2010), Euro SAT (Helber et al., 2019), Food101 (Bossard et al., 2014), Oxford Pets (Parkhi et al., 2012), Image Net (Russakovsky et al., 2015), Image Net V2 (Kornblith et al., 2019), Image Net-R (Hendrycks et al., 2021), and Image Net-C (Hendrycks & Dietterich, 2019).
Dataset Splits	Yes	We conduct experiments on the datasets CUB200 (Welinder et al., 2010), Euro SAT (Helber et al., 2019), Food101 (Bossard et al., 2014), Oxford Pets (Parkhi et al., 2012), Image Net (Russakovsky et al., 2015), Image Net V2 (Kornblith et al., 2019), Image Net-R (Hendrycks et al., 2021), and Image Net-C (Hendrycks & Dietterich, 2019).
Hardware Specification	Yes	Here, encoding an image takes 12.64ms on a V100 (minimum over 100 images).
Software Dependencies	Yes	For bisection, we use an independent call to scipy.optimize.bisect (Virtanen et al., 2020) (maxiter=100, xtol=1e-2, rtol=1e-2).
Experiment Setup	Yes	We set the target entropy to β log2 K, where the entropy reduction factor β [0, 1] is the new free hyperparameter that we set globally to β = 0.85... For bisection, we use an independent call to scipy.optimize.bisect (Virtanen et al., 2020) (maxiter=100, xtol=1e-2, rtol=1e-2).