reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Module-wise Adaptive Distillation for Multimodality Foundation Models

Authors: Chen Liang, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong, Tianyi Zhou

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the Co Ca-Large model [48] as the teacher model.
Researcher Affiliation	Collaboration	Chen Liang Georgia Tech EMAIL Jiahui Yu Google Research EMAIL Ming-Hsuan Yang UC Merced, Google Research EMAIL Matthew Brown Google Research EMAIL Yin Cui NVIDIA Research EMAIL Tuo Zhao Georgia Tech EMAIL Boqing Gong Google Research EMAIL Tianyi Zhou University of Maryland, College Park EMAIL
Pseudocode	Yes	Algorithm 1 OPTIMA: Module Adaptive Distillation
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the methodology, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We conduct task-specific distillation on three multimodal understanding tasks: visual question answering (VQA, [14]), visual entailment (SNLI-VE, [47]), and visual reasoning (NLVR2, [37]). We further train and evaluate the model using the Microsoft COCO Caption dataset [6] and the Karpathy-test split, respectively.
Dataset Splits	Yes	For the VQA task, we conduct downstream fine-tuning and testing on the VQA 2.0 dataset [14], which consists of 83k images and 444k questions for training, 41k images, and 214k questions for validation. For the image captioning task on COCO, we use [6] for training and testing. It contains 11k images for training and 5k images for validation and 5k images for testing.
Hardware Specification	Yes	We also extend our thanks to the TPU team for providing abundant computational infrastructure and resources.
Software Dependencies	No	The paper mentions software components like "Adafactor with decoupled weight decay" (an optimizer) and "sentence-piece model" but does not specify version numbers for these or other key software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For all tasks, we train the student for T = 100k steps. We use Adafactor with decoupled weight decay [34] as the optimizer with β = (0.9, 0.999) and a learning rate of 1 10 3 with a linear decay schedule. We set α1 = 0, α2 = 1 and α3 = 1 10 2 for all tasks. For OPTIMA, we set γ = 0.98, T0 = 10, P = 100 and T = T /P = 1k. Full details are deferred to Appendix A.4.