StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models

Authors: Ya Jiang, Chuxiong Wu, Massieh Kordi Boroujeny, Brian Mark, Kai Zeng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive empirical evaluations across diverse tasks highlight the stealthiness, detectability, and resilience of Stealth Ink, establishing it as an effective solution for LLM watermarking applications. ... 6. Experiments: We compare Stealth Ink with SOTA methods (Yoo et al., 2024; Qu et al., 2024; Fernandez et al., 2023) on stealthiness, detectability, and robustness.
Researcher Affiliation Academia 1Department of Computer Science, George Mason University, Fairfax, VA, USA 2Wireless Cyber Center, College of Engineering and Computing, George Mason University, Fairfax, VA, USA. Correspondence to: Ya Jiang <EMAIL>.
Pseudocode Yes Algorithm 1 shows the process of encoding a multi-bit watermark in Stealth Ink. ... Algorithm 2 in Appendix D.
Open Source Code No The paper describes a novel watermarking scheme called Stealth Ink and presents its methodology. However, it does not contain any explicit statement about releasing the source code or provide a link to a code repository for the described methodology.
Open Datasets Yes For text completion, unless noted otherwise, we use LLAMA2-7B (Touvron et al., 2023) and 500 randomly selected texts from the Real News Like subset of C4 (Raffel et al., 2020)... For the machine translation task, we focus on English-to Romanian translation and employ the Multilingual BART (MBart) model (Liu et al., 2020) on the WMT 14 En-Ro corpus (Bojar et al., 2014)... For the text summarization task, we employ the BART-large model (Liu et al., 2020)... we use the test set from the CNN-DM corpus (Hermann et al., 2015).
Dataset Splits Yes For the machine translation task, we utilize the WMT 16 English (En) to Romanian (Ro) dataset, comprising 1,999 examples in the test set. ... In the text summarization task, we use the test set from the CNN-DM corpus (Hermann et al., 2015), consisting of 11,490 examples on BART-large (Liu et al., 2020).
Hardware Specification Yes All experiments are conducted on the Nvidia A100 GPU with 40 GB of memory.
Software Dependencies No The paper mentions models like LLAMA2-7B, BART-large, and MBart, and uses SHA-256 as a pseudorandom function. However, it does not specify version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow) used to implement the experiments.
Experiment Setup Yes For text completion, unless noted otherwise, we use LLAMA2-7B (Touvron et al., 2023) and 500 randomly selected texts from the Real News Like subset of C4 (Raffel et al., 2020), trimming a fixed number of tokens from the start as prompts (see Appendix H). ... The default temperature is 1.0 and the texture key length h is 3. The multinomial sampling strategy is applied during text generation. ... Stealth Ink achieves an AUC of 0.98 and a bit accuracy of 0.92 when embedding 24-bit messages in 300 tokens.