K-ON: Stacking Knowledge on the Head Layer of Large Language Model
Authors: Lingbing Guo, Yichi Zhang, Zhongpu Bo, Zhuo Chen, Mengshu Sun, Zhiqiang Zhang, Wen Zhang, Huajun Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that K-ON outperforms state-of-the-art methods that incorporate text and even the other modalities. [...] We evaluate K-ON on the KG completion task without any simplification on the task setting. Our experiments demonstrate that K-ON not only outperforms the conventional methods, but also achieves better performance than the multi-modal methods that leverage additional textual and visual information. [...] The main experimental results are shown in Table 2. [...] We conduct ablation studies to verify the effectiveness of each module in K-ON. |
| Researcher Affiliation | Collaboration | 1College of Computer Science and Technology, Zhejiang University 2ZJU-Ant Group Joint Lab of Knowledge Graph 3Ant Group |
| Pseudocode | Yes | Algorithm 1: K-ON for KG Completion |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a code repository. It mentions 'We present Algorithm 1 to illustrate the implementation of K-ON step by step' but this is not a statement of open-sourcing the code. |
| Open Datasets | Yes | We consider DB15K and MKGW as benchmark, which are widely used in many recent works (Xie et al. 2017; Xu et al. 2022; Lee et al. 2023; Zhang, Chen, and Zhang 2023; Zhang et al. 2024c). The two datasets include not only the structural triplet data, but also the rich information of text and others. Thereby, we believe conducting experiments on them can gain a more comprehensive understanding on different methods and ensure a fairer comparison. The statistics of these two datasets are shown in Table 1. |
| Dataset Splits | Yes | Table 1: Statistics of the datasets. Dataset # Entity # Relation #Train #Valid #Test # Text # Image DB15K 12,842 279 79,222 9,902 9,904 12,842 12,818 MKGW 15,000 169 34,196 4,276 4,274 14,123 14,463 |
| Hardware Specification | Yes | Setting We employ Llama-2-chat-7B (Touvron et al. 2023) as the base LLM model and train K-ON with 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam W (Kingma and Ba 2015) as the optimizer' and 'Llama-2-chat-7B (Touvron et al. 2023) as the base LLM model', but it does not provide specific version numbers for any key software components or libraries used in their implementation. |
| Experiment Setup | Yes | The learning rate is set to 1e 4 in all experiments, and we use Adam W (Kingma and Ba 2015) as the optimizer. The batchsize per device is set to 12 and the gradient accumulation is set to 8 to obtaining a larger batch-size. [...] The overall fine-tuning time is less than 1 hour on the DB15K dataset with 8 A100 GPUs. |