Enhancing Multi-Hop Fact Verification with Structured Knowledge-Augmented Large Language Models
Authors: Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on several common-used multihop fact verification datasets FEVER (Thorne et al. 2018b), and HOVER (Jiang et al. 2020) to assess the effectiveness of LLM-SKAN. The experimental results on four common-used datasets demonstrate the effectiveness and superiority of our model. |
| Researcher Affiliation | Academia | Han Cao1,2, Lingwei Wei1*, Wei Zhou1, Songlin Hu1, 2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using natural language descriptions and mathematical equations, such as for the LLM-driven Knowledge Extractor prompt, graph neural network updates, and classification, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code will be released in https://github.com/Han Cao12/LLM-SKAN |
| Open Datasets | Yes | To evaluate the effectiveness of LLM-SKAN for both singlehop and multi-hop fact verification tasks, we choose 4 public benchmarks, FEVER (Thorne et al. 2018b) and 2-, 3, and 4-hop HOVER (Jiang et al. 2020), to conduct experiments. |
| Dataset Splits | Yes | The statistics are shown in Table 2. Dataset Train Dev Test FEVER 145,449 19,998 19,998 2-hop HOVER 9,052 1,126 1,333 3-hop HOVER 6,084 1,835 1,333 4-hop HOVER 33,035 1,039 1,333 |
| Hardware Specification | Yes | We use a Tesla V100-PCIE GPU with 32GB memory for all experiments and implement our model via the Pytorch framework. |
| Software Dependencies | No | The paper mentions implementing the model via the 'Pytorch framework' and fine-tuning 'Llama2-7b' but does not specify version numbers for these or any other software components. |
| Experiment Setup | Yes | The number of attention heads is set to 8. The batch size is 24. We set the learning rate as 2e-4. To keep consistency, we set the number of nodes of each relation graph to the maximum 20. |