reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Sourced Compositional Generalization in Visual Question Answering

Authors: Chuanhao Li, Wenbo Ye, Zhen Li, Yuwei Wu, Yunde Jia

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we explore MSCG in the context of visual question answering (VQA), and propose a retrieval-augmented training framework to enhance the MSCG ability of VQA models by learning uniﬁed representations for primitives from different modalities. ... To evaluate the MSCG ability of VQA models, we construct a new GQA-MSCG dataset based on the GQA dataset... Experimental results demonstrate that the proposed framework signiﬁcantly improves VQA models generalization ability to multi-sourced novel compositions while maintaining their independent and identically distributed (IID) generalization ability.
Researcher Affiliation	Academia	1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology, China 2Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University, China
Pseudocode	No	The paper describes the proposed framework and its components (retrieval database construction, feature retrieval, and feature aggregation) in descriptive text and with a diagram (Figure 2), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The GQA-MSCG dataset is available at https://github.com/Never More LCH/MSCG. This statement explicitly provides access to the GQA-MSCG dataset, not the source code for the methodology described in the paper.
Open Datasets	Yes	To evaluate the MSCG ability of VQA models, we construct a new GQA-MSCG dataset based on the GQA dataset... The GQA-MSCG dataset is available at https://github.com/Never More LCH/MSCG. ... three datasets are selected to validate the effectiveness of the proposed frameworks: the GQA dataset [Hudson and Manning, 2019], the VQA v2 dataset [Goyal et al., 2017] and our GQA-MSCG dataset.
Dataset Splits	Yes	For experiments on the GQA dataset and the GQA-MSCG dataset, we ﬁne-tune CFR, Qwen-VL, CFR+RAG, and Qwen-VL+RAG using the train balanced split of the GQA dataset and selected the best-performing model weights on the val balanced split of GQA. Using these model weights, we present the experimental results on the test-dev split of the GQA dataset and all seven test splits of our GQA-MSCG dataset. ... For each category of test samples, we randomly sample 5,000 samples from Dc, resulting in a total of 35,000 samples for the GQA-MSCG dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions model sizes (e.g., 'parameter size less than 0.2B', 'more than 7B parameters').
Software Dependencies	No	The paper mentions using the 'NLTK toolkit [Bird et al., 2009]' but does not specify a version number. It also refers to methods like 'LoRA [Hu et al., 2022]' and 'Faster R-CNN [Ren et al., 2016]', which are architectures or techniques, not specific software libraries with version numbers required for replication.
Experiment Setup	Yes	For experiments on all three datasets including GQA, GQA-MSCG and VQA v2, we ﬁnetune Qwen-VL and Qwen-VL+RAG with Lo RA [Hu et al., 2022] with a maximum of 2 epochs. For CFR+RAG and Qwen-VL+RAG, we set wq = 0.6 and wv = 0.4. ... The maximum number of epochs for ﬁne-tuning CFR and CFR+RAG was set to 12. The sampled number Tq and Tv for constructing Dq and Dq are set to 8 and 32, respectively. ... The number of aggregated primitive Kq and Kv are set to 4 and 16, respectively. Distinctively, for experiments on the VQA v2 dataset, we set Tq = 1, Tv = 32, Kq = 4 and Kv = 4.