reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When Open-Vocabulary Visual Question Answering Meets Causal Adapter: Benchmark and Approach

Authors: Feifei Zhang, Zhaoyi Zhang, Xi Zhang, Changsheng Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple datasets validate the superiority of our method over existing state-of-the-art approaches, demonstrating its robust generalization and adaptability in open-world VQA scenarios. [...] Tab. 2 and Tab. 3 present the experimental results on our reconstituted OVVQA datasets: OV-VQAv2, OV-GQA, and OV-OKVQA. We report performance across several aspects, including results for base and novel classes, arithmetic mean (Avg), and harmonic mean (H). [...] Ablation Studies.
Researcher Affiliation	Collaboration	1Tianjin University of Technology 2Alibaba Group 3National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 4School of Artificial Intelligence, University of Chinese Academy of Sciences 5Peng Cheng Laboratory
Pseudocode	No	The paper describes the methodology using natural language, mathematical equations, and diagrams (e.g., Figure 2 for causal graphs), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository. It only mentions that the causal adapter is a 'plug-and-play module'.
Open Datasets	Yes	We construct OVVQA using three standard datasets commonly employed in closed-set VQA: VQA v2 (Goyal et al. 2017) with 0.65 million image-question pairs, GQA (Hudson and Manning 2019) with 1.1 million pairs for visual reasoning and compositional question answering, and OKVQA (Marino et al. 2019) with 14,055 pairs requiring external knowledge for answer reasoning.
Dataset Splits	Yes	Tab. 1 presents the number of classes and samples in the train and test sets across our three reconstructed OVVQA benchmarks: OVVQAv2, OV-GQA, and OV-OKVQA. [...] Data Split Classes Samples Dataset Base Novel Base Novel OV-VQAv2 Train 2743 596265 Test 2743 386 52252 9594 OV-GQA Train 1022 1062339 Test 1022 821 12293 13008 OV-OKVQA Train 14040 9009 Test 14040 1000 4345 701
Hardware Specification	No	The paper describes the experimental setup including parameters and training details, but does not specify any particular hardware components such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using 'VL-T5 and VL-BART models' and tools like 'Faster R-CNN' and 'Word Piece tokenization', and optimization with 'Adam'. However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	In our experiments, the number of layers LA is set to 3. [...] The parameters from our causal adapter and Eq.(10) are optimized using Adam with a learning rate of 5e-5. Batch sizes are set to 80 for VL-T5 and 128 for VL-BART.