Speech Watermarking with Discrete Intermediate Representations
Authors: Shengpeng Ji, Ziyue Jiang, Jialong Zuo, Minghui Fang, Yifu Chen, Tao Jin, Zhou Zhao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our framework achieves state-of-the-art performance in robustness and imperceptibility, simultaneously. Moreover, our flexible frame-wise approach can serve as an efficient solution for both voice cloning detection and information hiding. Additionally, Discrete WM can encode 1 to 150 bits of watermark information within a 1-second speech clip, indicating its encoding capacity. |
| Researcher Affiliation | Academia | Zhejiang University EMAIL |
| Pseudocode | No | The paper describes methods using prose and mathematical formulations, but no clearly labeled 'Pseudocode' or 'Algorithm' blocks are present. |
| Open Source Code | No | Demo https://Discrete WM.github.io/discrete wm. This link points to a demonstration page, not explicitly to the source code repository for the methodology described in the paper. |
| Open Datasets | Yes | Datasets. For training, we employ the standard training set of Libri TTS (Zen et al. 2019), which contains approximately 585 hours of English speech at 24k Hz sampling rate. |
| Dataset Splits | Yes | For training, we employ the standard training set of Libri TTS (Zen et al. 2019)... We randomly select 100 text transcriptions and 100 speech prompts from the Libri TTS test-clean set... The test set also includes all of the speech samples from the testclean set of Libri TTS. |
| Hardware Specification | Yes | The RTF (Real-Time Factor) evaluation is conducted with 1 NVIDIA A100 GPU and batch size 1. |
| Software Dependencies | No | The paper mentions software components and techniques like STFT, VQ-VAE, and GANs, but it does not specify any version numbers for libraries or tools used for implementation. |
| Experiment Setup | Yes | For the Short-Time Fourier Transform operation (STFT), we adopt a filter length of 400, a hop length of 80, and a window function applied to each frame with a length of 400... λadv is the hyper-parameter to balance the three terms, which is set to 10 2... We set the watermark ratio m of Discrete WM to 10%. |