KGMark: A Diffusion Watermark for Knowledge Graphs
Authors: Hongrui Peng, Haolang Lu, Yuanlong Yu, Weiye Fu, Kun Wang, Guoshun Nan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various public benchmarks show the effectiveness of our proposed KGMark. Through rigorous testing, we demonstrate that KGMark achieves: ❶high detectability, with a watermark detection AUC up to 0.99; ❷maintaining KG quality and limiting downstream task performance loss to within the range of 0.02% 9.7%; and ❸high robustness, retaining an AUC of around 0.95 against various post-editing attacks. |
| Researcher Affiliation | Academia | 1Beijing University Of Posts and Telecommunications, Beijing, China 2Nanyang Technological University, Singapore. |
| Pseudocode | Yes | Algorithm 1 Graph Alignment Algorithm 2 Redundant Embedding Based on Subgraphs |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Datasets. We evaluate our approach using three public datasets representing diverse real-world scenarios: Last-FM (music) (C ano et al., 2017), MIND (news) (Wu et al., 2020), and Alibaba-i Fashion (e-commerce) (Chen et al., 2019). Table 2 provides a summary of these datasets. |
| Dataset Splits | No | The paper mentions using datasets like Last-FM, MIND, and Alibaba-i Fashion, but it does not specify any particular train/test/validation splits for these datasets. For the case study, it mentions, 'we restrict the user s click history by focusing on a strong interest in sports news. Additionally, we randomly sample 5 news items from the model s output for evaluation,' but this is not a general dataset split for reproducibility of all experiments. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA A800. |
| Software Dependencies | No | The paper mentions using specific models like Rotat E (Sun et al., 2019) and DKN (Wang et al., 2018) but does not provide specific version numbers for software libraries, frameworks, or solvers used in their implementation. |
| Experiment Setup | Yes | We first employ the Rotat E (Sun et al., 2019) model to embed the knowledge graph, with an embedding dimension of 4096. Our watermarking method is applied to the above-processed datasets, and a subsequent series of related experiments is carried out. The evaluations are conducted under a configuration where the DDIM inference steps are set to 75, and the predefined significance level is fixed at 5 10 5. The DKN model has been trained for 10 epochs on the MIND dataset. |