Core-to-Global Reasoning for Compositional Visual Question Answering
Authors: Hao Zhou, Tingjin Luo, Zhangqi Jiang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, extensive experimental results on GQA, GQA-sub, VQA2.0 and Visual7W demonstrate the effectiveness and superiority of CTGR. |
| Researcher Affiliation | Academia | Hao Zhou1, Tingjin Luo2*, Zhangqi Jiang2 1Department of Operational Research and Planning, Naval University of Engineering, Wuhan, Hubei, China 2College of Science, National University of Defense Technology, Changsha, Hunan, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the model architecture and methods using text and mathematical equations, but it does not contain a clearly labeled pseudocode block or algorithm. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | Yes | We evaluate our CTGR model on four benchmarks: GQA (Hudson and Manning 2019b), GQA-sub (Jing et al. 2022b), VQA2.0 (Goyal et al. 2017), and Visual7W(Zhu et al. 2016), and follow the same split in each dataset for training and testing. |
| Dataset Splits | Yes | We evaluate our CTGR model on four benchmarks: GQA (Hudson and Manning 2019b), GQA-sub (Jing et al. 2022b), VQA2.0 (Goyal et al. 2017), and Visual7W(Zhu et al. 2016), and follow the same split in each dataset for training and testing. |
| Hardware Specification | Yes | We trained and tested our model on an NVIDIA RTX 3090TI GPU. |
| Software Dependencies | No | Similar to CFR(Nguyen et al. 2022), we use the Glove to extract embedding vectors for questions. To extract image semantic features, we retrained both the attribute branch and relationship branch on SGG framework with the weight parameters proposed by Tang (Tang et al. 2019). |
| Experiment Setup | Yes | The model is trained with a batch size of 32 and an initial learning rate of 0.001 using Adam optimizer. Similar to CFR(Nguyen et al. 2022), we use the Glove to extract embedding vectors for questions. To extract image semantic features, we retrained both the attribute branch and relationship branch on SGG framework with the weight parameters proposed by Tang (Tang et al. 2019). The parameters dlk and dgc are empirically set to 768. We set λ=2, 4, 5, 6, 7, 9, and the corresponding results are shown in Tab. 6. From Tab. 6, we find that when λ = 6, our model can get the best performance. |