Speed Master: Quick or Slow Play to Attack Speaker Recognition

Authors: Zhe Ye, Wenjie Zhang, Ying Ren, Xiangui Kang, Diqun Yan, Bin Ma, Shiqi Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive experiments demonstrate that Speed Master can achieve an ASR over 99% in the digital domain, with only a 0.6% poisoning rate. Additionally, we validate the feasibility of Speed Master in the real world and its resistance to typical defensive measures. Extensive experiments are conducted on two datasets and two models to evaluate our method.
Researcher Affiliation Academia 1Guangdong Key Lab of Information Security, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China 3Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China 4Department of Computer Science, City University of Hong Kong, Hong Kong, China
Pseudocode No The paper describes the methodology and training process with figures and textual explanations, but it does not include a dedicated section for pseudocode or an algorithm block.
Open Source Code No The paper does not contain any explicit statement about releasing code, nor does it provide a link to a code repository.
Open Datasets Yes Moreover, our experiments are conducted on two benchmarks in the field: Vox Celeb1 (Nagrani, Chung, and Zisserman 2017) and Libri Speech (Panayotov et al. 2015).
Dataset Splits No The paper mentions selecting a proportion of samples for poisoning based on a 'poisoning rate ρ%' and creating a 'backdoor dataset Db' from poisoned and non-poisoned data. It also states the use of 'benign testing samples' and 'poisoned testing samples' for metrics (BA and ASR). However, it does not provide specific details on the train/test/validation split ratios or counts for the overall datasets (Vox Celeb1 and Libri Speech) used in the experiments.
Hardware Specification Yes We performed all experiments on a server running Ubuntu 20.04, equipped with four NVIDIA Ge Force RTX A6000 GPUs, utilizing a single card with 48GB of VRAM for the experiments.
Software Dependencies Yes The experiments were conducted using Pytorch version 1.11.0 and Torchaudio version 0.11.0.
Experiment Setup Yes For the default attack setting, we select the 100 as the target label and set the poisoning rate for the tempo method as 2% and other attacks as 0.6%. For our method, we use 0.8 as the default speed rate. During training, we incorporated room impulse response (RIR) and noise for trigger enhancement to ensure robustness in real-world conditions.