Can Watermarks be Used to Detect LLM IP Infringement For Free?

Authors: Zhengyue Zhao, Xiaogeng Liu, Somesh Jha, Patrick McDaniel, Bo Li, Chaowei Xiao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a black-box scenario from the detector s perspective with LLM watermarks such as KGW (Kirchenbauer et al., 2023) and Unigram (Zhao et al., 2024), as well as different source LLMs, suspect LLMs, and datasets for tuning suspect LLMs. Results reveal that our proposed detection method increases the reliability of discriminating against unauthorized distilling of source LLMs and further demonstrates successful cases of using LLM watermarks for defending against LLM s model infringement. Specifically, our method achieves the detection accuracy of over 90% in the cross-domain detection on a challenging model set containing suspect LLMs with multiple settings while the vanilla detection struggles to provide effective results.
Researcher Affiliation Collaboration Zhengyue Zhao 1, Xiaogeng Liu 1, Somesh Jha 1, Patrick Mc Daniel 1, Bo Li 2, Chaowei Xiao 1,3 1 University of Wisconsin-Madison 2 UIUC 3 NVIDIA
Pseudocode No The paper describes the methodology in narrative text and uses diagrams (e.g., Figure 1 for an overview of LLM IP infringement and detection) but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code Yes 1https://github.com/Zhengyue Zhao/llm_infringement_detection
Open Datasets Yes We use Llama-2-chat-7b (Touvron et al., 2023) and Llama-3-Instruct-8b (Meta, 2024) as source LLMs, and Bloom-7b (Le Scao et al., 2023) and Mistral-Instruct-7b (Jiang et al., 2023) as the base models for the suspect LLMs. For the queries used to sample fine-tuning data, we select two common domains: code generation (Evol-Instruct-Code (Luo et al., 2024)) and math problems (GSM8k (Cobbe et al., 2021)). During detection, we use queries from Alpaca dataset (Taori et al., 2023) to sample text from the suspect LLMs.
Dataset Splits No The paper states, "training data of suspect models are sampled from source models with 5k queries in the code or math dataset." and describes the construction of a "model set containing positive samples (trained with watermarked data) and negative samples (trained with un-watermarked data)" by training "320 suspect LLMs... 160 are positive samples... 160 are negative samples." This describes the size of the query set used for training and how the evaluation set of models is constructed, but it does not specify explicit training/validation/test splits of the data *within* the 5k queries used to train each suspect LLM for reproducibility of that training process.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or types of computing clusters used for running the experiments.
Software Dependencies No The paper mentions watermarking techniques like "Unigram (n = 0) and KGW (n = 1)" and that "Suspect models are tuned with LoRA (Hu et al., 2021)", but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes Suspect models are tuned with LoRA (Hu et al., 2021), with a batch size of 32, epochs of 4, and a constant learning rate of 1 10 4.