reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Modal Answer Validation for Knowledge-Based VQA

Authors: Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi2712-2721

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results.
Researcher Affiliation	Collaboration	Jialin Wu1, Jiasen Lu2, Ashish Sabharwal2, Roozbeh Mottaghi2 1 The University of Texas at Austin 2 Allen Institute for AI EMAIL, EMAIL
Pseudocode	No	The paper describes the framework steps but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://github.com/jialinwu17/MAVEX
Open Datasets	Yes	We evaluate MAVEx on OK-VQA (Marino et al. 2019), the largest knowledge-based VQA dataset to date.
Dataset Splits	Yes	We evaluate MAVEx on OK-VQA (Marino et al. 2019), the largest knowledge-based VQA dataset at present. The dataset contains 14,031 images and 14,055 questions... We use the finetuned model to extract the top 5 answers for each question in the training and test set.
Hardware Specification	Yes	We use Pytorch 1.4 on a single TITAN V GPU with 12M memory for each run, and it generally costs 22 hours to train a single model.
Software Dependencies	No	The paper mentions 'Pytorch 1.4' but does not provide version numbers for other significant software dependencies such as Allen NLP, T5 model, Mask R-CNN, or specific BERT/Tiny BERT implementations used.
Experiment Setup	Yes	We finetune the Vi LBERT-multi-task model on OK-VQA using the default configuration for 150 epochs for answer candidate generation... We train the system for 75 epochs using a learning rate of 2e-5 for the Vi LBERT parameters and 5e-5 for the additional parameters introduced in the validation module... The number of hidden units in the multi-head attention modules is set to 512.