Enhancing Question Generation through Diversity-Seeking Reinforcement Learning with Bilevel Policy Decomposition

Authors: Tianyu Ren, Hui Wang, Karen Rafferty

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our integrated approach, named BPD-DSRL, demonstrates superior performance over existing baselines on multiple question quality and diversity metrics across various QG benchmarks. ... Table 2 presents the comparative results on three widely-used QG benchmarks. ... We conduct a series of ablation studies on our BPD framework and DSRL objective.
Researcher Affiliation Academia Tianyu Ren, Hui Wang*, Karen Rafferty School of Electronics, Electrical Engineering and Computer Science, Queen s University Belfast, United Kingdom EMAIL
Pseudocode No The pseudo-code for BPD-DSRL training and further technical specifics are provided in the supplementary material.
Open Source Code Yes Code and other supplementary material https://github.com/Tianyu-Ren/BPD-DSRL
Open Datasets Yes Following previous work (Gou et al. 2023; Narayan et al. 2022; Wang et al. 2020), we conduct experiments on two QG datasets: SQu AD 1.1 (Rajpurkar et al. 2016) and News QA (Trischler et al. 2017).
Dataset Splits Yes Table 1: Statistics of the selected benchmarks. SQu AD 1.1 / 1 and SQu AD 1.1 / 2 are two different splits of SQu AD 1.1 from (Zhou et al. 2017) and (Du, Shao, and Cardie 2017).
Hardware Specification No The implementation of them is detailed in the supplementary material. (The main text does not specify hardware used for experiments.)
Software Dependencies No All of our QG models and outcome reward models start from the pre-trained checkpoints of T5-large (Raffel et al. 2020). ... To assess hallucination, we employ Spa Cy to extract named entities ... For precision and cost-effectiveness, we utilize GPT-3.5 (Turbo-0125) in a zero-shot setting as the QA model. (No specific version numbers for software libraries are mentioned in the main text.)
Experiment Setup No We use consistent hyperparameter configurations across all three datasets during training (SFT warm-up and RL) and inference. The implementation of them is detailed in the supplementary material. (Specific hyperparameter values are not provided in the main text.)