reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

K-ON: Stacking Knowledge on the Head Layer of Large Language Model

Authors: Lingbing Guo, Yichi Zhang, Zhongpu Bo, Zhuo Chen, Mengshu Sun, Zhiqiang Zhang, Wen Zhang, Huajun Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that K-ON outperforms state-of-the-art methods that incorporate text and even the other modalities. [...] We evaluate K-ON on the KG completion task without any simplification on the task setting. Our experiments demonstrate that K-ON not only outperforms the conventional methods, but also achieves better performance than the multi-modal methods that leverage additional textual and visual information. [...] The main experimental results are shown in Table 2. [...] We conduct ablation studies to verify the effectiveness of each module in K-ON.
Researcher Affiliation	Collaboration	1College of Computer Science and Technology, Zhejiang University 2ZJU-Ant Group Joint Lab of Knowledge Graph 3Ant Group
Pseudocode	Yes	Algorithm 1: K-ON for KG Completion
Open Source Code	No	The paper does not contain an explicit statement about releasing code or a link to a code repository. It mentions 'We present Algorithm 1 to illustrate the implementation of K-ON step by step' but this is not a statement of open-sourcing the code.
Open Datasets	Yes	We consider DB15K and MKGW as benchmark, which are widely used in many recent works (Xie et al. 2017; Xu et al. 2022; Lee et al. 2023; Zhang, Chen, and Zhang 2023; Zhang et al. 2024c). The two datasets include not only the structural triplet data, but also the rich information of text and others. Thereby, we believe conducting experiments on them can gain a more comprehensive understanding on different methods and ensure a fairer comparison. The statistics of these two datasets are shown in Table 1.
Dataset Splits	Yes	Table 1: Statistics of the datasets. Dataset # Entity # Relation #Train #Valid #Test # Text # Image DB15K 12,842 279 79,222 9,902 9,904 12,842 12,818 MKGW 15,000 169 34,196 4,276 4,274 14,123 14,463
Hardware Specification	Yes	Setting We employ Llama-2-chat-7B (Touvron et al. 2023) as the base LLM model and train K-ON with 8 A100 GPUs.
Software Dependencies	No	The paper mentions using 'Adam W (Kingma and Ba 2015) as the optimizer' and 'Llama-2-chat-7B (Touvron et al. 2023) as the base LLM model', but it does not provide specific version numbers for any key software components or libraries used in their implementation.
Experiment Setup	Yes	The learning rate is set to 1e 4 in all experiments, and we use Adam W (Kingma and Ba 2015) as the optimizer. The batchsize per device is set to 12 and the gradient accumulation is set to 8 to obtaining a larger batch-size. [...] The overall fine-tuning time is less than 1 hour on the DB15K dataset with 8 A100 GPUs.