reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show SIRA achieves nearly 100% attack success rates on seven recent watermarking methods with only $0.88 per million tokens cost. Our work exposes a widely prevalent vulnerability in current watermarking algorithms.
Researcher Affiliation	Academia	1University of British Columbia 2Vector Institute for AI 3Fudan University 4University of Cambridge 5Canada CIFAR AI Chair 6NSERC CRC Chair. Correspondence to: Yixin Cheng <EMAIL>.
Pseudocode	Yes	The pseduocode of our algorithm is shown in Algorithm 1. Algorithm 1 Pseudocode for Self-information rewrite attack
Open Source Code	No	The source code is available at SIRA. ... We will release our code to the community to facilitate further research in developing responsible AI practices and advancing the robustness of watermarking algorithms.
Open Datasets	Yes	Following prior watermarking research (Kirchenbauer et al., 2023; Zhao et al., 2023; Liu et al., 2024; Kuditipudi et al., 2023), we utilize the C4 dataset (Raffel et al., 2020a) for general-purpose text generation scenarios. ... We conduct additional experiments on the Open Gen dataset (Krishna et al., 2024), which consists of sampled passages from Wiki Text-103.
Dataset Splits	Yes	Following prior watermarking research (Kirchenbauer et al., 2023; Zhao et al., 2023; Liu et al., 2024; Kuditipudi et al., 2023), we utilize the C4 dataset (Raffel et al., 2020a) for general-purpose text generation scenarios. We selected 500 random samples from the test set to serve as prompts for generating the subsequent 230 tokens, using the original C4 texts as non-watermarked examples. ... We used generated 500 attack texts as positive samples and 500 human-written texts as negative samples.
Hardware Specification	Yes	Our method runs on NVIDIA A100 GPUs(Tiny, Small run on a single GPU). ... The experiments were run on NVIDIA A100 40GB GPUs, utilizing a sequential device map for baseline methods requiring multiple GPUs.
Software Dependencies	No	We use the huggingface library in our experiment. No specific version numbers were provided for software dependencies.
Experiment Setup	Yes	For our method, we use ϵ = 0.3 as threshold. ... The watermark hyperparameter settings shown in Appendix A, and the detection settings adhere to the default/recommendations (Pan et al., 2024) configurations of the original works. Specifically, for KGW-k, k is the number of preceding tokens to hash. ... For DIPPER-1 the lex diversity is 60 without order diversity, and for DIPPER-2 we additionally increase the order diversity by 40. The word deletion ratio is set to 0.3 and the synonym substitution ratio is set to 0.5. ... The temperature for the base model is set to 0.7.