Enhancing Multi-Hop Fact Verification with Structured Knowledge-Augmented Large Language Models

Authors: Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on several common-used multihop fact verification datasets FEVER (Thorne et al. 2018b), and HOVER (Jiang et al. 2020) to assess the effectiveness of LLM-SKAN. The experimental results on four common-used datasets demonstrate the effectiveness and superiority of our model.
Researcher Affiliation Academia Han Cao1,2, Lingwei Wei1*, Wei Zhou1, Songlin Hu1, 2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using natural language descriptions and mathematical equations, such as for the LLM-driven Knowledge Extractor prompt, graph neural network updates, and classification, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1The code will be released in https://github.com/Han Cao12/LLM-SKAN
Open Datasets Yes To evaluate the effectiveness of LLM-SKAN for both singlehop and multi-hop fact verification tasks, we choose 4 public benchmarks, FEVER (Thorne et al. 2018b) and 2-, 3, and 4-hop HOVER (Jiang et al. 2020), to conduct experiments.
Dataset Splits Yes The statistics are shown in Table 2. Dataset Train Dev Test FEVER 145,449 19,998 19,998 2-hop HOVER 9,052 1,126 1,333 3-hop HOVER 6,084 1,835 1,333 4-hop HOVER 33,035 1,039 1,333
Hardware Specification Yes We use a Tesla V100-PCIE GPU with 32GB memory for all experiments and implement our model via the Pytorch framework.
Software Dependencies No The paper mentions implementing the model via the 'Pytorch framework' and fine-tuning 'Llama2-7b' but does not specify version numbers for these or any other software components.
Experiment Setup Yes The number of attention heads is set to 8. The batch size is 24. We set the learning rate as 2e-4. To keep consistency, we set the number of nodes of each relation graph to the maximum 20.