Black-Box Detection of Language Model Watermarks
Authors: Thibaud Gloaguen, Nikola Jovanović, Robin Staab, Martin Vechev
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models. Further, we validate the feasibility of our tests on real-world APIs. |
| Researcher Affiliation | Academia | Thibaud Gloaguen, Nikola Jovanovi c, Robin Staab, Martin Vechev ETH Zurich EMAIL, EMAIL |
| Pseudocode | Yes | We present an additional algorithmic description of the Red-Green test ( 2) in Algorithm 2, the Fixed-Sampling test ( 3) in Algorithm 4 and the Cache-Augmented test ( 4) in Algorithm 5. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/eth-sri/ watermark-detection. |
| Open Datasets | No | To generate the samples for the original watermark detector, following the method in (Kirchenbauer et al., 2023), we generate 100 completions of 200 tokens, using prompts sampled from C4. |
| Dataset Splits | No | The paper describes experimental setups with specific query numbers and repetitions (e.g., "N1 = 10, N1 = 9, r = 1.96", "n = 1000 queries", "Q1 = Q2 = 75"), but it does not specify training/test/validation splits for any datasets. The focus is on statistical testing of generated text, not on traditional model training or evaluation splits. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running its experiments. It mentions testing on black-box LLM deployments (GPT4, CLAUDE 3, GEMINI 1.0 PRO), but these are the target systems, not the hardware used by the authors for their experimental setup. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, frameworks, or operating systems used in its methodology or experimental setup. It only discusses the conceptual aspects of watermark detection and the models under evaluation. |
| Experiment Setup | Yes | For Red-Green tests, we set N1 = 10, N1 = 9, r = 1.96, a different Σ per model based on the first Q1 samples, use 100 samples to estimate the probabilities, and use 10000 permutations in the test. ... For Fixed-Sampling tests, we use n = 1000 queries and set t = 50. For Cache-Augmented tests, we use Q1 = Q2 = 75 and assume the cache is cleared between queries in the second phase. |