reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explicitly Guided Difficulty-Controllable Visual Question Generation

Authors: Jiayuan Xie, Mengqiu Cheng, Xinting Zhang, Yi Cai, Guimin Hu, Mengying Xie, Qing Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on our constructed dataset show that our benchmark significantly outperforms existing baselines in controlling the reasoning chains of generated questions, producing questions with varying difficulty levels. Experimental results show that our proposed framework outperforms existing state-of-the-art models in both automatic and human evaluations, and can controllably generate questions with the required number of reasoning steps.
Researcher Affiliation	Academia	1Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China 2School of Software Engineering, South China University of Technology, Guangzhou, China 3 Department of Computer Science, University of Copenhagen, Denmark 4 Department of Mathematics, The University of Hong Kong, Hong Kong SAR, China 5 College of Computer Science, Chongqing University, China 6 Guangdong Neusoft University, Foshan, China EMAIL
Pseudocode	No	The paper describes the model architecture and its components (Reasoning Chain Selection, Visual Feature Extractor, Controllable Rewriting Module) in detail within the 'Model' section, but it does not present these as structured pseudocode or an algorithm block. The process steps are explained in paragraph form.
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	According to our proposed difficulty definition, we construct our DVQA dataset from the GQA dataset (Hudson and Manning 2019) to evaluate model performance. The GQA dataset automatically constructs diverse questions involving various reasoning skills mainly through the visual genome scene graph structure (Johnson et al. 2015; Krishna et al. 2017), which is a dataset for real-world visual reasoning questions answering.
Dataset Splits	Yes	The dataset we constructed in this paper contains 220,657 question pairs. Specifically, 80% of our dataset is used as a training set, 10% as a validation set, and 10% as a test set.
Hardware Specification	Yes	Our model is implemented using the Py Torch framework and trained on a single GTX2080 Ti GPU.
Software Dependencies	No	The paper mentions using the Py Torch framework, CLIP model, GPT-2 model, and Adamax optimizer, but it does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	The model is trained for up to 5 epochs utilizing the Adamax optimizer (Kingma and Ba 2015), with a batch size of 128 and a learning rate of 2 10 5.