reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KGMark: A Diffusion Watermark for Knowledge Graphs

Authors: Hongrui Peng, Haolang Lu, Yuanlong Yu, Weiye Fu, Kun Wang, Guoshun Nan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various public benchmarks show the effectiveness of our proposed KGMark. Through rigorous testing, we demonstrate that KGMark achieves: ❶high detectability, with a watermark detection AUC up to 0.99; ❷maintaining KG quality and limiting downstream task performance loss to within the range of 0.02% 9.7%; and ❸high robustness, retaining an AUC of around 0.95 against various post-editing attacks.
Researcher Affiliation	Academia	1Beijing University Of Posts and Telecommunications, Beijing, China 2Nanyang Technological University, Singapore.
Pseudocode	Yes	Algorithm 1 Graph Alignment Algorithm 2 Redundant Embedding Based on Subgraphs
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Datasets. We evaluate our approach using three public datasets representing diverse real-world scenarios: Last-FM (music) (C ano et al., 2017), MIND (news) (Wu et al., 2020), and Alibaba-i Fashion (e-commerce) (Chen et al., 2019). Table 2 provides a summary of these datasets.
Dataset Splits	No	The paper mentions using datasets like Last-FM, MIND, and Alibaba-i Fashion, but it does not specify any particular train/test/validation splits for these datasets. For the case study, it mentions, 'we restrict the user s click history by focusing on a strong interest in sports news. Additionally, we randomly sample 5 news items from the model s output for evaluation,' but this is not a general dataset split for reproducibility of all experiments.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A800.
Software Dependencies	No	The paper mentions using specific models like Rotat E (Sun et al., 2019) and DKN (Wang et al., 2018) but does not provide specific version numbers for software libraries, frameworks, or solvers used in their implementation.
Experiment Setup	Yes	We first employ the Rotat E (Sun et al., 2019) model to embed the knowledge graph, with an embedding dimension of 4096. Our watermarking method is applied to the above-processed datasets, and a subsequent series of related experiments is carried out. The evaluations are conducted under a configuration where the DDIM inference steps are set to 75, and the predefined significance level is fixed at 5 10 5. The DKN model has been trained for 10 epochs on the MIND dataset.