reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Vision-Language Instruction Tuning: A Review and Analysis

Authors: Chen Li, Yixiao Ge, Dian Li, Ying Shan

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs. [...] Section 5 primarily includes the design, implementation, and discussion of the verification experiment.
Researcher Affiliation	Industry	Chen Li EMAIL, Yixiao Ge EMAIL, Dian Li EMAIL, Ying Shan EMAIL, ARC Lab, Tencent PCG Foundation Technology Center, Tencent PCG
Pseudocode	No	The paper describes the proposed pipeline in prose and illustrates it with a flowchart in Figure 4, but it does not contain any formally structured pseudocode blocks or algorithms.
Open Source Code	Yes	The code and dataset related to this paper have been open-sourced at https://github.com/palchenli/VL-Instruction-Tuning.
Open Datasets	Yes	The code and dataset related to this paper have been open-sourced at https://github.com/palchenli/VL-Instruction-Tuning. [...] Specifically, in data collection, we first select COCO 2014 (Lin et al., 2014) as the image source, and {caption, object, attribute, OCR, visual QA} as the selected sources of annotation data (Antol et al., 2015; Patterson & Hays, 2016; Veit et al., 2016).
Dataset Splits	No	The paper states: "In the quality evaluation process, to ensure fairness, we use the smallest dataset size as the scale for all the test VLIT data and randomly sample VLIT datasets larger than this scale." This describes a method for selecting evaluation data size but does not provide explicit train/validation/test splits for the constructed VLIT data or the existing VLIT datasets used for instruction tuning.
Hardware Specification	Yes	These MLLMs are all trained utilizing 8 Telsa V100 (32G) GPUs with the Python environment and other detailed settings (e.g., hyperparameters) of the three models can be found in Section A.3 in the Appendix.
Software Dependencies	No	The paper mentions a "Python environment" and specific MLLM libraries such as "LLaVA library", "LAVIS", and "Open Flamingo" but does not provide specific version numbers for these software components or the Python interpreter itself.
Experiment Setup	Yes	These MLLMs are all trained utilizing 8 Telsa V100 (32G) GPUs with the Python environment and other detailed settings (e.g., hyperparameters) of the three models can be found in Section A.3 in the Appendix. [...] Table 11, Table 12, and Table 13 respectively list all their hyperparameter settings during instruction tuning.