A Watermark for Order-Agnostic Language Models
Authors: Ruibo Chen, Yihan Wu, Yanshuo Chen, Chenxi Liu, Junfeng Guo, Heng Huang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluations on order-agnostic LMs, such as Protein MPNN and CMLM, demonstrate PATTERN-MARK s enhanced detection efficiency, generation quality, and robustness, positioning it as a superior watermarking technique for orderagnostic LMs. Through comprehensive experiments on two popular order-agnostic LMs, Protein MPNN (Dauparas et al., 2022) and CMLM (Ghazvininejad et al., 2019), we demonstrate the superiority of PATTERN-MARK in terms of detection efficiency, generation quality, and robustness compared to baseline methods. Our experimental section consists of three parts. In the first part, we compare the detection efficiency of PATTERN-MARK with the baseline. In the second part, we evaluate the generation quality of PATTERN-MARK. In the third part, we assess the robustness of the PATTERN-MARK when subjected to random token modification and paraphrasing attacks. |
| Researcher Affiliation | Academia | Department of Computer Science University of Maryland, College Park, MD, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 PATTERN-MARK generator Algorithm 2 PATTERN-MARK detector Algorithm 3 Compute pattern occurrence probability under the null hypothesis Algorithm 4 Compute pattern occurrence probability under the null hypothesis |
| Open Source Code | No | The paper does not explicitly provide an unambiguous statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | For CMLM, we use the data collected from news crawl1 news.2015.ro.shuffled whose length is larger than 128 to encourage longer generation. The filtered dataset has 1003 samples. 1https://data.statmt.org/news-crawl/ For Protein MPNN, we use the protein features from PCSB Protein Data Bank2, which is published from 2020 Jan. 1st to 2023 Dec. 31st. We limit the number of polymer residues per deposited model to between 400 and 500. The filtered dataset has 747 samples. 2https://www.rcsb.org/ |
| Dataset Splits | No | The paper mentions the total number of samples used for evaluation (e.g., "around 800 generated protein sequences" and "around 1000 generated sequences") but does not provide specific train/validation/test splits for these or any underlying datasets for reproducing the data partitioning for their experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions models like Protein MPNN (Dauparas et al., 2022) and CMLM (Ghazvininejad et al., 2019) and a specific checkpoint "v48_020 model checkpoint for Protein MPNN". However, it does not list specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks) required to replicate their experimental setup for PATTERN-MARK itself. |
| Experiment Setup | Yes | We select the key set K = {k1, k2}, the Markov-chain transition matrix A = [[0, 1], [1, 0]], and the initial distribution Q = [0.5, 0.5]. The key patterns are defined as T = {k1k2k1 . . . , k2k1k2 . . .}, where k1 and k2 appear alternately, T Km. Under this configuration, the probability PT,n can be calculated using Algorithm 4, which optimizes the process described in Algorithm 3. We select δ {0.5, 0.75, 1.0, 1.25, 1.5} for protein generation task, and δ {1, 2, 3, 4, 5} for machine translation task. |