Speed Master: Quick or Slow Play to Attack Speaker Recognition
Authors: Zhe Ye, Wenjie Zhang, Ying Ren, Xiangui Kang, Diqun Yan, Bin Ma, Shiqi Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive experiments demonstrate that Speed Master can achieve an ASR over 99% in the digital domain, with only a 0.6% poisoning rate. Additionally, we validate the feasibility of Speed Master in the real world and its resistance to typical defensive measures. Extensive experiments are conducted on two datasets and two models to evaluate our method. |
| Researcher Affiliation | Academia | 1Guangdong Key Lab of Information Security, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China 3Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China 4Department of Computer Science, City University of Hong Kong, Hong Kong, China |
| Pseudocode | No | The paper describes the methodology and training process with figures and textual explanations, but it does not include a dedicated section for pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Moreover, our experiments are conducted on two benchmarks in the field: Vox Celeb1 (Nagrani, Chung, and Zisserman 2017) and Libri Speech (Panayotov et al. 2015). |
| Dataset Splits | No | The paper mentions selecting a proportion of samples for poisoning based on a 'poisoning rate ρ%' and creating a 'backdoor dataset Db' from poisoned and non-poisoned data. It also states the use of 'benign testing samples' and 'poisoned testing samples' for metrics (BA and ASR). However, it does not provide specific details on the train/test/validation split ratios or counts for the overall datasets (Vox Celeb1 and Libri Speech) used in the experiments. |
| Hardware Specification | Yes | We performed all experiments on a server running Ubuntu 20.04, equipped with four NVIDIA Ge Force RTX A6000 GPUs, utilizing a single card with 48GB of VRAM for the experiments. |
| Software Dependencies | Yes | The experiments were conducted using Pytorch version 1.11.0 and Torchaudio version 0.11.0. |
| Experiment Setup | Yes | For the default attack setting, we select the 100 as the target label and set the poisoning rate for the tempo method as 2% and other attacks as 0.6%. For our method, we use 0.8 as the default speed rate. During training, we incorporated room impulse response (RIR) and noise for trigger enhancement to ensure robustness in real-world conditions. |