When Open-Vocabulary Visual Question Answering Meets Causal Adapter: Benchmark and Approach
Authors: Feifei Zhang, Zhaoyi Zhang, Xi Zhang, Changsheng Xu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across multiple datasets validate the superiority of our method over existing state-of-the-art approaches, demonstrating its robust generalization and adaptability in open-world VQA scenarios. [...] Tab. 2 and Tab. 3 present the experimental results on our reconstituted OVVQA datasets: OV-VQAv2, OV-GQA, and OV-OKVQA. We report performance across several aspects, including results for base and novel classes, arithmetic mean (Avg), and harmonic mean (H). [...] Ablation Studies. |
| Researcher Affiliation | Collaboration | 1Tianjin University of Technology 2Alibaba Group 3National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 4School of Artificial Intelligence, University of Chinese Academy of Sciences 5Peng Cheng Laboratory |
| Pseudocode | No | The paper describes the methodology using natural language, mathematical equations, and diagrams (e.g., Figure 2 for causal graphs), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository. It only mentions that the causal adapter is a 'plug-and-play module'. |
| Open Datasets | Yes | We construct OVVQA using three standard datasets commonly employed in closed-set VQA: VQA v2 (Goyal et al. 2017) with 0.65 million image-question pairs, GQA (Hudson and Manning 2019) with 1.1 million pairs for visual reasoning and compositional question answering, and OKVQA (Marino et al. 2019) with 14,055 pairs requiring external knowledge for answer reasoning. |
| Dataset Splits | Yes | Tab. 1 presents the number of classes and samples in the train and test sets across our three reconstructed OVVQA benchmarks: OVVQAv2, OV-GQA, and OV-OKVQA. [...] Data Split Classes Samples Dataset Base Novel Base Novel OV-VQAv2 Train 2743 596265 Test 2743 386 52252 9594 OV-GQA Train 1022 1062339 Test 1022 821 12293 13008 OV-OKVQA Train 14040 9009 Test 14040 1000 4345 701 |
| Hardware Specification | No | The paper describes the experimental setup including parameters and training details, but does not specify any particular hardware components such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using 'VL-T5 and VL-BART models' and tools like 'Faster R-CNN' and 'Word Piece tokenization', and optimization with 'Adam'. However, it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | In our experiments, the number of layers LA is set to 3. [...] The parameters from our causal adapter and Eq.(10) are optimized using Adam with a learning rate of 5e-5. Batch sizes are set to 80 for VL-T5 and 128 for VL-BART. |