Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks
Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show SIRA achieves nearly 100% attack success rates on seven recent watermarking methods with only $0.88 per million tokens cost. Our work exposes a widely prevalent vulnerability in current watermarking algorithms. |
| Researcher Affiliation | Academia | 1University of British Columbia 2Vector Institute for AI 3Fudan University 4University of Cambridge 5Canada CIFAR AI Chair 6NSERC CRC Chair. Correspondence to: Yixin Cheng <EMAIL>. |
| Pseudocode | Yes | The pseduocode of our algorithm is shown in Algorithm 1. Algorithm 1 Pseudocode for Self-information rewrite attack |
| Open Source Code | No | The source code is available at SIRA. ... We will release our code to the community to facilitate further research in developing responsible AI practices and advancing the robustness of watermarking algorithms. |
| Open Datasets | Yes | Following prior watermarking research (Kirchenbauer et al., 2023; Zhao et al., 2023; Liu et al., 2024; Kuditipudi et al., 2023), we utilize the C4 dataset (Raffel et al., 2020a) for general-purpose text generation scenarios. ... We conduct additional experiments on the Open Gen dataset (Krishna et al., 2024), which consists of sampled passages from Wiki Text-103. |
| Dataset Splits | Yes | Following prior watermarking research (Kirchenbauer et al., 2023; Zhao et al., 2023; Liu et al., 2024; Kuditipudi et al., 2023), we utilize the C4 dataset (Raffel et al., 2020a) for general-purpose text generation scenarios. We selected 500 random samples from the test set to serve as prompts for generating the subsequent 230 tokens, using the original C4 texts as non-watermarked examples. ... We used generated 500 attack texts as positive samples and 500 human-written texts as negative samples. |
| Hardware Specification | Yes | Our method runs on NVIDIA A100 GPUs(Tiny, Small run on a single GPU). ... The experiments were run on NVIDIA A100 40GB GPUs, utilizing a sequential device map for baseline methods requiring multiple GPUs. |
| Software Dependencies | No | We use the huggingface library in our experiment. No specific version numbers were provided for software dependencies. |
| Experiment Setup | Yes | For our method, we use ϵ = 0.3 as threshold. ... The watermark hyperparameter settings shown in Appendix A, and the detection settings adhere to the default/recommendations (Pan et al., 2024) configurations of the original works. Specifically, for KGW-k, k is the number of preceding tokens to hash. ... For DIPPER-1 the lex diversity is 60 without order diversity, and for DIPPER-2 we additionally increase the order diversity by 40. The word deletion ratio is set to 0.3 and the synonym substitution ratio is set to 0.5. ... The temperature for the base model is set to 0.7. |