Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show SIRA achieves nearly 100% attack success rates on seven recent watermarking methods with only $0.88 per million tokens cost. Our work exposes a widely prevalent vulnerability in current watermarking algorithms.
Researcher Affiliation Academia 1University of British Columbia 2Vector Institute for AI 3Fudan University 4University of Cambridge 5Canada CIFAR AI Chair 6NSERC CRC Chair. Correspondence to: Yixin Cheng <EMAIL>.
Pseudocode Yes The pseduocode of our algorithm is shown in Algorithm 1. Algorithm 1 Pseudocode for Self-information rewrite attack
Open Source Code No The source code is available at SIRA. ... We will release our code to the community to facilitate further research in developing responsible AI practices and advancing the robustness of watermarking algorithms.
Open Datasets Yes Following prior watermarking research (Kirchenbauer et al., 2023; Zhao et al., 2023; Liu et al., 2024; Kuditipudi et al., 2023), we utilize the C4 dataset (Raffel et al., 2020a) for general-purpose text generation scenarios. ... We conduct additional experiments on the Open Gen dataset (Krishna et al., 2024), which consists of sampled passages from Wiki Text-103.
Dataset Splits Yes Following prior watermarking research (Kirchenbauer et al., 2023; Zhao et al., 2023; Liu et al., 2024; Kuditipudi et al., 2023), we utilize the C4 dataset (Raffel et al., 2020a) for general-purpose text generation scenarios. We selected 500 random samples from the test set to serve as prompts for generating the subsequent 230 tokens, using the original C4 texts as non-watermarked examples. ... We used generated 500 attack texts as positive samples and 500 human-written texts as negative samples.
Hardware Specification Yes Our method runs on NVIDIA A100 GPUs(Tiny, Small run on a single GPU). ... The experiments were run on NVIDIA A100 40GB GPUs, utilizing a sequential device map for baseline methods requiring multiple GPUs.
Software Dependencies No We use the huggingface library in our experiment. No specific version numbers were provided for software dependencies.
Experiment Setup Yes For our method, we use ϵ = 0.3 as threshold. ... The watermark hyperparameter settings shown in Appendix A, and the detection settings adhere to the default/recommendations (Pan et al., 2024) configurations of the original works. Specifically, for KGW-k, k is the number of preceding tokens to hash. ... For DIPPER-1 the lex diversity is 60 without order diversity, and for DIPPER-2 we additionally increase the order diversity by 40. The word deletion ratio is set to 0.3 and the synonym substitution ratio is set to 0.5. ... The temperature for the base model is set to 0.7.