LLM4VKG: Leveraging Large Language Models for Virtual Knowledge Graph Construction
Authors: Guohui Xiao, Lin Ren, Guilin Qi, Haohan Xue, Marco Di Panfilo, Davide Lanti
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluation on the RODI benchmark demonstrates that LLM4VKG surpasses state-of-the-art methods, achieving an average F1-score improvement of +17% and a peak gain of +39%. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Southeast University, Nanjing, China 2Free University of Bozen-Bolzano, Italy |
| Pseudocode | No | The paper describes the methodology in prose and includes SPARQL queries in Section 4.1, but it does not contain any clearly labeled pseudocode or algorithm blocks describing the LLM4VKG framework or its components. |
| Open Source Code | Yes | All code and datasets associated with this work are publicly available.1 1https://github.com/Homura T/LLM4VKG |
| Open Datasets | Yes | We evaluate LLM4VKG on RODI [Pinkel et al., 2018] and RODI-T (x%), a variant of RODI in which x% of the ontology vocabulary is removed from the ontology starting from the leaf nodes. |
| Dataset Splits | No | The paper describes the structure of the RODI benchmark samples and how queries are evaluated, but it does not provide specific details on how the dataset itself is split into training, validation, or test sets for the LLM4VKG's operation or evaluation phases. It states 'A RODI sample comprises three main elements: a database schema, a golden ontology, and a set of query pairs', indicating the nature of the samples, but not their partitioning for experimentation. |
| Hardware Specification | No | The paper thanks the Big Data Computing Center of Southeast University for 'facility support on the numerical calculations' but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It mentions using various LLMs (GPT-4o, Qwen2.5-7b) as backbone models but not the hardware on which these models were utilized for the experiments. |
| Software Dependencies | Yes | In this study, we leverage the VKG system Ontop [Calvanese et al., 2017]6 to initialize the generated VKG based on a database connection, an ontology, and a set of mappings. For Retriever, we use bge-m3 [Chen et al., 2024] as the backbone model. For Matcher and Namer, we incorporate GPT-4o, GPT-4o-mini [Open AI, 2024a], and Qwen2.5-7b [Qwen, 2024] as backbone models, representing various levels of performance across LLMs. |
| Experiment Setup | No | The paper mentions that the Retriever module uses a pre-trained sentence similarity language model to retrieve 'top-n candidate elements' where 'n is a hyperparameter,' but the specific value for 'n' is not provided. It also states that 'The detailed prompts for the modules are in Appendix B,' implying some experimental details are not in the main text. Specific hyperparameters like learning rates, batch sizes, or optimizer settings are not described in the main body of the paper. |