Explicitly Guided Difficulty-Controllable Visual Question Generation
Authors: Jiayuan Xie, Mengqiu Cheng, Xinting Zhang, Yi Cai, Guimin Hu, Mengying Xie, Qing Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on our constructed dataset show that our benchmark significantly outperforms existing baselines in controlling the reasoning chains of generated questions, producing questions with varying difficulty levels. Experimental results show that our proposed framework outperforms existing state-of-the-art models in both automatic and human evaluations, and can controllably generate questions with the required number of reasoning steps. |
| Researcher Affiliation | Academia | 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China 2School of Software Engineering, South China University of Technology, Guangzhou, China 3 Department of Computer Science, University of Copenhagen, Denmark 4 Department of Mathematics, The University of Hong Kong, Hong Kong SAR, China 5 College of Computer Science, Chongqing University, China 6 Guangdong Neusoft University, Foshan, China EMAIL |
| Pseudocode | No | The paper describes the model architecture and its components (Reasoning Chain Selection, Visual Feature Extractor, Controllable Rewriting Module) in detail within the 'Model' section, but it does not present these as structured pseudocode or an algorithm block. The process steps are explained in paragraph form. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | According to our proposed difficulty definition, we construct our DVQA dataset from the GQA dataset (Hudson and Manning 2019) to evaluate model performance. The GQA dataset automatically constructs diverse questions involving various reasoning skills mainly through the visual genome scene graph structure (Johnson et al. 2015; Krishna et al. 2017), which is a dataset for real-world visual reasoning questions answering. |
| Dataset Splits | Yes | The dataset we constructed in this paper contains 220,657 question pairs. Specifically, 80% of our dataset is used as a training set, 10% as a validation set, and 10% as a test set. |
| Hardware Specification | Yes | Our model is implemented using the Py Torch framework and trained on a single GTX2080 Ti GPU. |
| Software Dependencies | No | The paper mentions using the Py Torch framework, CLIP model, GPT-2 model, and Adamax optimizer, but it does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | The model is trained for up to 5 epochs utilizing the Adamax optimizer (Kingma and Ba 2015), with a batch size of 128 and a learning rate of 2 10 5. |