StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models
Authors: Ya Jiang, Chuxiong Wu, Massieh Kordi Boroujeny, Brian Mark, Kai Zeng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive empirical evaluations across diverse tasks highlight the stealthiness, detectability, and resilience of Stealth Ink, establishing it as an effective solution for LLM watermarking applications. ... 6. Experiments: We compare Stealth Ink with SOTA methods (Yoo et al., 2024; Qu et al., 2024; Fernandez et al., 2023) on stealthiness, detectability, and robustness. |
| Researcher Affiliation | Academia | 1Department of Computer Science, George Mason University, Fairfax, VA, USA 2Wireless Cyber Center, College of Engineering and Computing, George Mason University, Fairfax, VA, USA. Correspondence to: Ya Jiang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 shows the process of encoding a multi-bit watermark in Stealth Ink. ... Algorithm 2 in Appendix D. |
| Open Source Code | No | The paper describes a novel watermarking scheme called Stealth Ink and presents its methodology. However, it does not contain any explicit statement about releasing the source code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | For text completion, unless noted otherwise, we use LLAMA2-7B (Touvron et al., 2023) and 500 randomly selected texts from the Real News Like subset of C4 (Raffel et al., 2020)... For the machine translation task, we focus on English-to Romanian translation and employ the Multilingual BART (MBart) model (Liu et al., 2020) on the WMT 14 En-Ro corpus (Bojar et al., 2014)... For the text summarization task, we employ the BART-large model (Liu et al., 2020)... we use the test set from the CNN-DM corpus (Hermann et al., 2015). |
| Dataset Splits | Yes | For the machine translation task, we utilize the WMT 16 English (En) to Romanian (Ro) dataset, comprising 1,999 examples in the test set. ... In the text summarization task, we use the test set from the CNN-DM corpus (Hermann et al., 2015), consisting of 11,490 examples on BART-large (Liu et al., 2020). |
| Hardware Specification | Yes | All experiments are conducted on the Nvidia A100 GPU with 40 GB of memory. |
| Software Dependencies | No | The paper mentions models like LLAMA2-7B, BART-large, and MBart, and uses SHA-256 as a pseudorandom function. However, it does not specify version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow) used to implement the experiments. |
| Experiment Setup | Yes | For text completion, unless noted otherwise, we use LLAMA2-7B (Touvron et al., 2023) and 500 randomly selected texts from the Real News Like subset of C4 (Raffel et al., 2020), trimming a fixed number of tokens from the start as prompts (see Appendix H). ... The default temperature is 1.0 and the texture key length h is 3. The multinomial sampling strategy is applied during text generation. ... Stealth Ink achieves an AUC of 0.98 and a bit accuracy of 0.92 when embedding 24-bit messages in 300 tokens. |