Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers

Authors: Vaden Masrani, Mohammad Akbari, David Ming Xuan Yue, Ahmad Rezaei, Yong Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance.
Researcher Affiliation Industry Huawei Technologies Canada Co. Ltd. EMAIL
Pseudocode No The paper describes methods using mathematical equations and descriptive text, but no distinct pseudocode or algorithm blocks are explicitly provided or labeled.
Open Source Code Yes Code https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=58b799a0-5cfc-4c2e-8b9b440bb2315264
Open Datasets Yes Following (Gu et al. 2023), we validate our method across 4 classification tasks and 7 datasets: SST2 (Socher et al. 2013), IMDB (Maas et al. 2011), SNLI (Bowman et al. 2015), MNLI (Williams, Nangia, and Bowman 2018), AGNews (Zhang, Zhao, and Le Cun 2015), News Group (NG) (Lang 1995), and PAWS (Zhang, Baldridge, and He 2019), covering sentiment, entailment, and paraphrase detection, and topic classification tasks.
Dataset Splits No The paper mentions fine-tuning for a certain number of epochs and using a pruning ratio, but does not provide specific train/test/validation dataset splits (e.g., exact percentages or sample counts) for reproducibility in the main text. It states, "Hyperparameter settings for each stage and additional details about how metrics are calculated are given in the Appendix (Masrani et al. 2024)", suggesting these details might be elsewhere.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only implies that computational resources were used by mentioning 'Wallclock times are reported in Table 1'.
Software Dependencies No The paper mentions using publicly available PLMs from Hugging Face and specific models like BERT-based-uncased, GPT-2, and Llama2-7B. However, it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or Hugging Face Transformers library versions).
Experiment Setup Yes We add 1 passthrough layer at position {3,5,8} (PTL-358) to the pretrained BERT, and train it for 10K steps. All layers except the passthrough layers, head, and last layer are frozen. [...] we use GPT-2 with 124M parameters. [...] We add passthrough layers at positions {1}, {1,4,7}, and {1,3,5,7,9}, and train for 100k steps on the Open Web Text. [...] We fine-tune BERT described in the Classification Tasks Section for 10 epochs over 5 downstream tasks. [...] with a pruning ratio of 50% [...] followed by a fine-tuning round for 1 epoch