Select and Augment: Enhanced Dense Retrieval Knowledge Graph Augmentation
Authors: Micheal Abaho, Yousef H. Alfaifi
JAIR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results for Link Prediction demonstrate a 5.5% and 3.5% percentage increase in the Mean Reciprocal Rank (MRR) and Hits@10 scores respectively, in comparison to text-enhanced knowledge graph augmentation methods using traditional CNNs. Our proposed method is evaluated on KG completion tasks (described under Section 4.3) such as link and relation prediction using Freebase FB15k dataset Veira et al. (2019). |
| Researcher Affiliation | Academia | Micheal Abaho EMAIL University of Liverpool, United Kingdom Yousef H. Alfaifi EMAIL Faculty of Computers and Information Technology, University of Tabuk, Tabuk, Saudi Arabia |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations, but it does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We adopt the Freebase (FB15K) Knowledge graph, the Babelnet corpus (Navigli & Ponzetto, 2012), Google News dataset, and Wikipedia articles (Veira et al., 2019), from which we assemble text descriptions for all the KG entities. |
| Dataset Splits | Yes | After pre-processing the gathered entity descriptions, the dataset is split into training, validation and test sets) and the resultant dataset statistics are presented in Table 4. Table 4 provides a breakdown of the Train, Validation (Val) and Test splits used in our experiments and the 3.8M entity descriptions which contain 96% of the entities within the KB. As shown in the table, the text descriptions respectively cover 51.8%, 30.4% and 29.9% of the Training, Validation and Testing triples. Table 1: Dataset statistics for both KB and text corpus FB15K 1,341 14,904 472,860 / 57,803 / 48,991 Text 3814190 14,308 244946 / 17572 / 14599 |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using a 'pre-trained SBERT model' and initializing KGE models like 'Trans E', 'Dist Mult', 'Comp IEx', and 'Rotat E'. However, it does not specify software dependencies like programming languages or library versions (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The number of negative samples per triple is set to 100 and k is set to 5. We tune all hyper-parameters using the validation data, and obtain optimal values as follows: learning rate 1e-3, batch size 8, KG embedding size 200. Further details on tuning bounds are provided in Table 4. Table 2: Parameter settings for DRKA KG Embedding dimension [50,100,200,300] Optimal 200 γ [0.5,1.0,1.5,2.0] Optimal 1.0 Optimizer [SGD,Adam] Optimal Adam Epochs [20, 50, 70, 100, 120] Optimal 70 Learning rate [5e-4, 1e-4, 5e-3, 1e-3, 5e-2, 1e-2] Optimal 1e-3 |