reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Can Watermarks be Used to Detect LLM IP Infringement For Free?

Authors: Zhengyue Zhao, Xiaogeng Liu, Somesh Jha, Patrick McDaniel, Bo Li, Chaowei Xiao

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on a black-box scenario from the detector s perspective with LLM watermarks such as KGW (Kirchenbauer et al., 2023) and Unigram (Zhao et al., 2024), as well as different source LLMs, suspect LLMs, and datasets for tuning suspect LLMs. Results reveal that our proposed detection method increases the reliability of discriminating against unauthorized distilling of source LLMs and further demonstrates successful cases of using LLM watermarks for defending against LLM s model infringement. Specifically, our method achieves the detection accuracy of over 90% in the cross-domain detection on a challenging model set containing suspect LLMs with multiple settings while the vanilla detection struggles to provide effective results.
Researcher Affiliation	Collaboration	Zhengyue Zhao 1, Xiaogeng Liu 1, Somesh Jha 1, Patrick Mc Daniel 1, Bo Li 2, Chaowei Xiao 1,3 1 University of Wisconsin-Madison 2 UIUC 3 NVIDIA
Pseudocode	No	The paper describes the methodology in narrative text and uses diagrams (e.g., Figure 1 for an overview of LLM IP infringement and detection) but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	Yes	1https://github.com/Zhengyue Zhao/llm_infringement_detection
Open Datasets	Yes	We use Llama-2-chat-7b (Touvron et al., 2023) and Llama-3-Instruct-8b (Meta, 2024) as source LLMs, and Bloom-7b (Le Scao et al., 2023) and Mistral-Instruct-7b (Jiang et al., 2023) as the base models for the suspect LLMs. For the queries used to sample fine-tuning data, we select two common domains: code generation (Evol-Instruct-Code (Luo et al., 2024)) and math problems (GSM8k (Cobbe et al., 2021)). During detection, we use queries from Alpaca dataset (Taori et al., 2023) to sample text from the suspect LLMs.
Dataset Splits	No	The paper states, "training data of suspect models are sampled from source models with 5k queries in the code or math dataset." and describes the construction of a "model set containing positive samples (trained with watermarked data) and negative samples (trained with un-watermarked data)" by training "320 suspect LLMs... 160 are positive samples... 160 are negative samples." This describes the size of the query set used for training and how the evaluation set of models is constructed, but it does not specify explicit training/validation/test splits of the data within the 5k queries used to train each suspect LLM for reproducibility of that training process.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or types of computing clusters used for running the experiments.
Software Dependencies	No	The paper mentions watermarking techniques like "Unigram (n = 0) and KGW (n = 1)" and that "Suspect models are tuned with LoRA (Hu et al., 2021)", but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	Suspect models are tuned with LoRA (Hu et al., 2021), with a batch size of 32, epochs of 4, and a constant learning rate of 1 10 4.