reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multimodal Few-Shot Learning with Frozen Language Models

Authors: Maria Tsimpoukelli, Jacob L Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are designed to quantify three capacities that should be characteristic of a Multi Modal Few-Shot Learner: rapid adaptation to new tasks, fast access to general knowledge and fast binding of visual and linguistic elements. We quantify these capabilities on a range of existing and new benchmarks, paving the way for future analysis of these capabilities.
Researcher Affiliation	Collaboration	Maria Tsimpoukelli Deep Mind EMAIL Jacob Menick Deep Mind University College London EMAIL Serkan Cabi Deep Mind EMAIL S. M. Ali Eslami Deep Mind EMAIL Oriol Vinyals Deep Mind EMAIL Felix Hill Deep Mind EMAIL
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The Open-Ended mini Imagenet, Real-Name mini Imagenet, Fast-VQA and Guided-VQA evaluation sets are available to download at https://fh295.github.io/frozen.html. This link is for evaluation datasets, not the source code for the methodology.
Open Datasets	Yes	We use a 7 billion parameter transformer trained on the public dataset C4 [31] previous work has shown that the multi-billion parameter scale is sufﬁcient to exhibit the key capacities we are interested in studying [30, 34]. During training, we update only the parameters φ of the vision encoder using paired image-caption data from the Conceptual Captions dataset [37].
Dataset Splits	Yes	We do early stopping on the validation set perplexity which usually reaches an optimum just after a single epoch with batch size 128. We evaluate on the VQAv2 [10] validation set.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or TPU versions) used for running experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions software components like 'Sentence Piece tokenizer' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	All experiments used the Adam optimizer with β1 = 0.9 and β2 = 0.95 and a constant learning rate of 3e-4 unless otherwise noted. We do early stopping on the validation set perplexity which usually reaches an optimum just after a single epoch with batch size 128. We experimented using different number of tokens k, speciﬁcally 1, 2 and 4 and found that 2 performs best, though certainly this would be sensitive to other architectural details. We operate on 224 224 images at both train and test-time. Images which are not square are ﬁrst padded with zeroes to square and then resized to 224 224.