Enhancing Question Generation through Diversity-Seeking Reinforcement Learning with Bilevel Policy Decomposition
Authors: Tianyu Ren, Hui Wang, Karen Rafferty
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our integrated approach, named BPD-DSRL, demonstrates superior performance over existing baselines on multiple question quality and diversity metrics across various QG benchmarks. ... Table 2 presents the comparative results on three widely-used QG benchmarks. ... We conduct a series of ablation studies on our BPD framework and DSRL objective. |
| Researcher Affiliation | Academia | Tianyu Ren, Hui Wang*, Karen Rafferty School of Electronics, Electrical Engineering and Computer Science, Queen s University Belfast, United Kingdom EMAIL |
| Pseudocode | No | The pseudo-code for BPD-DSRL training and further technical specifics are provided in the supplementary material. |
| Open Source Code | Yes | Code and other supplementary material https://github.com/Tianyu-Ren/BPD-DSRL |
| Open Datasets | Yes | Following previous work (Gou et al. 2023; Narayan et al. 2022; Wang et al. 2020), we conduct experiments on two QG datasets: SQu AD 1.1 (Rajpurkar et al. 2016) and News QA (Trischler et al. 2017). |
| Dataset Splits | Yes | Table 1: Statistics of the selected benchmarks. SQu AD 1.1 / 1 and SQu AD 1.1 / 2 are two different splits of SQu AD 1.1 from (Zhou et al. 2017) and (Du, Shao, and Cardie 2017). |
| Hardware Specification | No | The implementation of them is detailed in the supplementary material. (The main text does not specify hardware used for experiments.) |
| Software Dependencies | No | All of our QG models and outcome reward models start from the pre-trained checkpoints of T5-large (Raffel et al. 2020). ... To assess hallucination, we employ Spa Cy to extract named entities ... For precision and cost-effectiveness, we utilize GPT-3.5 (Turbo-0125) in a zero-shot setting as the QA model. (No specific version numbers for software libraries are mentioned in the main text.) |
| Experiment Setup | No | We use consistent hyperparameter configurations across all three datasets during training (SFT warm-up and RL) and inference. The implementation of them is detailed in the supplementary material. (Specific hyperparameter values are not provided in the main text.) |