Robust Multi-bit Text Watermark with LLM-based Paraphrasers

Authors: Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, Hang Li

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we show that our watermarks can achieve over 99.99% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLMbased evaluation.
Researcher Affiliation Collaboration 1Byte Dance Research 2Michigan State University 3University of California, Santa Cruz. Correspondence to: Xiaojun Xu <EMAIL>.
Pseudocode Yes The encoding algorithm is shown in Alg. 1. We track the current watermark bit, and the next token is generated with the corresponding paraphraser θbit. After each generation step, we check whether the next token will be in a new segment by calculating S(xw; mode=E). If the new segment starts, we will update bit to be the next bit in the watermark message.
Open Source Code Yes We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark.
Open Datasets Yes The encoder and decoder are trained and evaluated on the C4 Real News Like dataset (Raffel et al., 2020), processed using standard settings in (Kirchenbauer et al., 2023; Xu et al., 2024; Lau et al., 2024). Without specification, we will use texts with 128 tokens for training and evaluation.
Dataset Splits No The encoder and decoder are trained and evaluated on the C4 Real News Like dataset (Raffel et al., 2020), processed using standard settings in (Kirchenbauer et al., 2023; Xu et al., 2024; Lau et al., 2024). Without specification, we will use texts with 128 tokens for training and evaluation.
Hardware Specification No We use a relatively small Tiny Llama-1.1b model architecture (Zhang et al., 2024a) for θ0, θ1 and θd, as we observe that small models can already achieve a good performance in paraphrasing and watermarking. We show the experiments with larger Llama2-7b models in Appendix C.
Software Dependencies No We use a relatively small Tiny Llama-1.1b model architecture (Zhang et al., 2024a) for θ0, θ1 and θd, as we observe that small models can already achieve a good performance in paraphrasing and watermarking. We show the experiments with larger Llama2-7b models in Appendix C.
Experiment Setup Yes We fine-tune the model for 10,000 steps with batch size of 4. We use λw = 0.1, λs = 1.0 and λk = 0.02 as the coefficients. In the initialization stage, we will generate the paraphrased data x SF T para with Pegasus paraphraser (Zhang et al., 2020), and use λJS = 1.0 for the intialization loss.