Undetectable Steganography for Language Models

Authors: Or Zamir

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While this paper is theoretical in nature and the properties of the suggested scheme are proven rigorously, we also implemented the scheme and provide empirical examples. In Section 7 we discuss our implementation of the scheme and some empirical evaluation of it. In Figure 2, we estimate the number of message bits we can hide in a response of a certain length. For each length of response, we ran our scheme for 100 times using the LLM model GPT-2 (RWC+19) on a randomly chosen prompt from the list of example prompts provided by Open AI on their GPT-2 webpage.
Researcher Affiliation Academia Or Zamir EMAIL School of Computer Science Tel Aviv University
Pseudocode Yes The pseudo-codes for generation (Algorithm 1) and detection (Algorithm 2) of the watermark appear in the Appendix. In CGZ, those algorithms are then generalized to also support the detection of the watermark from a substring out of the response and not only from the response in its entirety as is sketched above. ... Algorithm 3: One-query steganography algorithm Stegk ... Algorithm 4: One-query retriever Retrk ... Algorithm 5: Steganography algorithm Stegk ... Algorithm 6: Retriever algorithm Retrk
Open Source Code Yes 1Code available at: https://github.com/Or Zamir/steg
Open Datasets No The paper uses LLM models (GPT-2, Llama 2) to generate text for evaluation, but does not explicitly use a pre-existing dataset for experiments or provide a dataset for public access. The evaluation relies on generating responses from these models using prompts.
Dataset Splits No The paper describes experiments where responses are generated using LLMs. It does not involve traditional dataset splitting for training, validation, or testing of a model, as its focus is on embedding information into LLM outputs.
Hardware Specification No The paper mentions using LLM models such as GPT-2 and Llama 2 for its empirical evaluations, but it does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) on which these experiments were run.
Software Dependencies No The paper mentions using LLM models (GPT-2, Llama 2) but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation of their scheme.
Experiment Setup Yes We ran it with threshold parameter t = 2, which we didn’t optimize.