KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search
Authors: Haoran Luo, Haihong E, Yikai Guo, Qika Lin, Xiaobao Wu, Xinyu Mu, Wenhao Liu, Meina Song, Yifan Zhu, Anh Tuan Luu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that KBQA-o1 outperforms previous low-resource KBQA methods with limited annotated data, boosting Llama-3.1-8B model s Grail QA F1 performance to 78.5% compared to 48.5% of the previous sota method with GPT-3.5-turbo. Our code is publicly available at https://github. com/LHRLAB/KBQA-o1. ... We perform experiments on three KBQA datasets, Grail QA (Gu et al., 2021), Web QSP (Yih et al., 2016) and Graph Q (Su et al., 2016) in low-resource settings (Li et al., 2023) for application with limited annotated data. Experimental results demonstrate that KBQA-o1 outperforms existing low-resource KBQA methods and even approaches or surpasses the performance of fully supervised KBQA models, especially in more difficult cases like compositional and zero-shot. Ablation studies further validate the proposed MCTS-based agent process and incremental fine-tuning, both of which make KBQA-o1 outperform other forms of KBQA methods, as shown in Figure 2. |
| Researcher Affiliation | Academia | Haoran Luo 1 2 Haihong E 1 Yikai Guo 3 Qika Lin 4 Xiaobao Wu 2 Xinyu Mu 1 Wenhao Liu 1 Meina Song 1 Yifan Zhu 1 Luu Anh Tuan 2 1Beijing University of Posts and Telecommunications 2Nanyang Technological University 3Beijing Institute of Computer Technology and Application 4National University of Singapore. Correspondence to: Haihong E <EMAIL>. |
| Pseudocode | Yes | Figure 8 and Algorithm 1 illustrate the Monte Carlo Tree Search (MCTS) process in KBQA-o1. The figure highlights the four stages of MCTS: Selection, where nodes are chosen using the Upper Confidence Bound for Trees (UCT) to balance exploration and exploitation; Expansion, where candidate actions are generated by the policy model, filtered for relevance to the knowledge base, and added as child nodes; Simulation, where the most promising path is explored to produce a complete logical form and compute rewards; and Back-propagation, where rewards are propagated back to update Q-values and visit counts. The pseudocode formalizes this process, iteratively performing rollouts that follow the four stages. |
| Open Source Code | Yes | Our code is publicly available at https://github. com/LHRLAB/KBQA-o1. |
| Open Datasets | Yes | We perform experiments on three KBQA datasets, Grail QA (Gu et al., 2021), Web QSP (Yih et al., 2016) and Graph Q (Su et al., 2016) in low-resource settings (Li et al., 2023) for application with limited annotated data. |
| Dataset Splits | Yes | Following KB-BINDER (Li et al., 2023), we conduct 40-shot experiments for Grail QA, and 100-shot for Web QSP and Graph Q. ... Table 6. Dataset statistics of KBQA-o1. I.I.D Compositional Zero-shot Grail QA Web QSP Graph Q #Train 40 100 100 #Exploration 43851 2929 2332 #Test 1564 1487 3645 6696 1566 2319 |
| Hardware Specification | Yes | All experiments are done on 8 NVIDIA A40 GPUs (48GB), with results averaged from three randomly seeded experiments. |
| Software Dependencies | No | The paper mentions several LLMs used (Llama-3, Qwen2.5, Gemma-2) and a model for semantic similarity (Sim CSE), but does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | During the MCTS exploration phase, we set θexp with w = 50, while in the prediction phase, we set θeff with w = 10. We select multiple open-source 7B-72B LLMs, including Llama-3 (Dubey et al., 2024), Qwen2.5 (Yang et al., 2025) and Gemma-2 (Team et al., 2024), to construct KBQA-o1. ... Appendix G shows the optimal hyperparameter settings. ... Table 7 presents the hyperparameter configurations for the KBQA-o1 across three datasets: Grail QA, Web QSP, and Graph Q. These parameters are categorized into four stages: Initial Few-shot SFT, MCTS Exploration Stage, Incremental Fine-tuning, and MCTS Prediction Stage, each designed to optimize the KBQA framework s performance for different tasks. |