Robust Multi-bit Text Watermark with LLM-based Paraphrasers
Authors: Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, Hang Li
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that our watermarks can achieve over 99.99% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLMbased evaluation. |
| Researcher Affiliation | Collaboration | 1Byte Dance Research 2Michigan State University 3University of California, Santa Cruz. Correspondence to: Xiaojun Xu <EMAIL>. |
| Pseudocode | Yes | The encoding algorithm is shown in Alg. 1. We track the current watermark bit, and the next token is generated with the corresponding paraphraser θbit. After each generation step, we check whether the next token will be in a new segment by calculating S(xw; mode=E). If the new segment starts, we will update bit to be the next bit in the watermark message. |
| Open Source Code | Yes | We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark. |
| Open Datasets | Yes | The encoder and decoder are trained and evaluated on the C4 Real News Like dataset (Raffel et al., 2020), processed using standard settings in (Kirchenbauer et al., 2023; Xu et al., 2024; Lau et al., 2024). Without specification, we will use texts with 128 tokens for training and evaluation. |
| Dataset Splits | No | The encoder and decoder are trained and evaluated on the C4 Real News Like dataset (Raffel et al., 2020), processed using standard settings in (Kirchenbauer et al., 2023; Xu et al., 2024; Lau et al., 2024). Without specification, we will use texts with 128 tokens for training and evaluation. |
| Hardware Specification | No | We use a relatively small Tiny Llama-1.1b model architecture (Zhang et al., 2024a) for θ0, θ1 and θd, as we observe that small models can already achieve a good performance in paraphrasing and watermarking. We show the experiments with larger Llama2-7b models in Appendix C. |
| Software Dependencies | No | We use a relatively small Tiny Llama-1.1b model architecture (Zhang et al., 2024a) for θ0, θ1 and θd, as we observe that small models can already achieve a good performance in paraphrasing and watermarking. We show the experiments with larger Llama2-7b models in Appendix C. |
| Experiment Setup | Yes | We fine-tune the model for 10,000 steps with batch size of 4. We use λw = 0.1, λs = 1.0 and λk = 0.02 as the coefficients. In the initialization stage, we will generate the paraphrased data x SF T para with Pegasus paraphraser (Zhang et al., 2020), and use λJS = 1.0 for the intialization loss. |