Enhancing Transferability of Audio Adversarial Example for Both Frequency- and Time-domain

Authors: Zilin Tian, Yunfei Long, Liguo Zhang, Jiahong Zhao

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on diverse datasets consistently demonstrate that AIE outperforms existing methods, establishing its effectiveness in enhancing adversarial transferability across domains.
Researcher Affiliation Academia 1Harbin Engineering University 2University of Southampton EMAIL, EMAIL
Pseudocode Yes Algorithm 1 AIE with MI-FGSM attack Input: Surrogate models fw, fs; A natural audio example x with label y Parameter: The perturbation magnitude ϵ; the number of iteration T; the decay factor µ; the hyper-parameter k Output: An adversarial example xadv 1: Initialize: α = ϵ/T; M0 = 0; xadv 0 = x; 2: for t = 1 to T do 3: Initialize set η = 0.5 4: # Calculate discrepancy ratio between the individual potential outputs and the ensemble potential output 5: Calculate the individual potential outputs ps and pw using Eq. 8 6: Calculate the ensemble potential output p using Eq. 9 7: Calculate the discrepancy ratio ρ = cos(ps,p) cos(pw,p) 8: # Adaptively adjust the domain weights based on the discrepancy ratio 9: Update the domain weight η using Eq. 10 10: Calculate the inter-domain ensemble loss L E(xadv, η), y with updated η 11: # Update momentum using the gradient of interdomain ensemble loss 12: Get Mt+1 = µMt + xadv t L(xadv t ,η) xadv t L(xadv t ,η) 1 13: # Update adversarial example 14: xadv t+1 = Q Bϵ(x) xadv t + α sign (Mt+1) 15: end for 16: return xadv = xadv T .
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes To comprehensively evaluate the effectiveness of the proposed method, we conduct extensive experiments on two widely recognized datasets for audio classification tasks: Urban Sound8k [Salamon et al., 2014] for environmental sound classification and Ships Ear [Santos-Dom ınguez et al., 2016] for underwater acoustic target identification.
Dataset Splits No The paper states: "From each dataset, we randomly select 1000 clean audio examples, ensuring that each is correctly classified by all evaluated models and preventing data overlap." This describes the selection of examples for evaluation, not the training/validation/test splits used to train the models themselves. Specific split percentages or counts for training the models are not provided.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9).
Experiment Setup Yes Hyper-parameters. We empirically set the maximum perturbation to 0.01 (l = 0.01), the number of iterations T = 10, the step size α = 0.002. For MI and NI, we set the decay factor µ = 1.0. For VMI, we set the number of sampled examples N = 20 and the upper bound of neighborhood size β = 1.5 ϵ. For EMI, we set the number of sampled examples to 11, the sampling interval bound to 7, and adopt the linear sampling. The inner update time in SVRE is set to be four times the number of models. The tolerance threshold and temperature coefficient in Ada Ea are set to be 0.3 and 10.