X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Authors: Hanxun Huang, Sarah Monazam Erfani, Yige Li, Xingjun Ma, James Bailey
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations demonstrate that X-Transfer significantly outperforms previous state-of-the-art UAP methods, establishing a new benchmark for adversarial transferability across CLIP models. ... In this work, we propose the X-Transfer attack, a novel attack method that generates UAPs via an efficient surrogate scaling strategy applied to a large number of surrogate models. ... We conduct extensive experiments to demonstrate the effectiveness of X-Transfer... |
| Researcher Affiliation | Academia | 1School of Computing and Information Systems, The University of Melbourne, Australia 2School of Computing and Information Systems, Singapore Management University, Singapore 3School of Computer Science, Fudan University, China. Correspondence to: Yige Li <EMAIL>, Xingjun Ma <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 X-Transfer Input: surrogate dataset D , search space S = {f 1, , f N}, total number of optimisation steps j, momentum m, number of selection k. Initialise arrays R, T, as zero-filled arrays of length N Initialise δ randomly for step = 1 to j do x = sample(D ) { Random sample a batch of images} x = x + δ µ = UCB(R, T) { Compute UCB scores} F K = Top K(µ, k, S) { Select Top k encoders} for fi to F K do zi = f I i (x), z i = f I i (x ) Compute Li(A, zi, z i) { Follow Eq. (3)} Ri = (1 m) Ri + m Li { Moving average} Ti = Ti + 1 end for L = 1 k Pk i=1 Li {Follow Eq. (5)} δ = δ ηsign( L(δ)) δ = project(δ, ϵ, ϵ) end for |
| Open Source Code | Yes | The code is publicly available in our Git Hub repository. ... Building on this, we establish a new benchmark, X-Transfer Bench, which offers a comprehensive, open-source collection of UAPs and TUAPs for super transferability studies. |
| Open Datasets | Yes | We use Image Net (Deng et al., 2009) as the default surrogate dataset. ... Beyond Image Net, we employ CIFAR-10 (C-10), CIFAR-100 (C-100) (Krizhevsky et al., 2009), Food (Bossard et al., 2014), GTSRB (Stallkamp et al., 2012), Stanford Cars (Cars) (Krause et al., 2013), STL10 (Coates et al., 2011), SUN397 (Xiao et al., 2016), MSCOCO (Chen et al., 2015), Flickr-30K (Young et al., 2014), OK-VQA (Marino et al., 2019), and Viz Wiz (Gurari et al., 2018) datasets to evaluate cross-domain transferability. |
| Dataset Splits | No | The paper refers to using various datasets for evaluation (e.g., Image Net, CIFAR-10/100, MSCOCO) and states, 'We apply the same UAP to every image in each dataset to evaluate cross-data transferability.' While it uses standard datasets, it does not explicitly provide specific percentages, sample counts, or clear methodologies for training/validation/test splits used for their own experiments or UAP generation. It also states 'We use Image Net (Deng et al., 2009) as the default surrogate dataset' but doesn't detail how it was split for UAP generation. |
| Hardware Specification | No | The paper mentions 'GPU Days' in Table 3 and 'consistent hardware settings' in section 4.2, implying the use of GPUs. The acknowledgement section also states 'This research was supported by The University of Melbourne s Research Computing Services and the Petascale Campus Initiative.' However, it does not provide specific details such as GPU model numbers (e.g., NVIDIA A100), CPU models, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014) as the optimiser' and states 'For all experiments, we utilise the open-source implementation Open CLIP (Ilharco et al., 2021).' However, it does not provide specific version numbers for these or any other software dependencies (e.g., Python version, PyTorch version) required for replication. |
| Experiment Setup | Yes | UAP Generation. We use Image Net (Deng et al., 2009) as the default surrogate dataset. The value of k is set to 4 for the Base search space, 8 for the Mid search space, and 16 for the Large search space. Following Fang et al. (2024b); Zhang et al. (2024), we employ L -norm bounded perturbations with ϵ = 12/255. We use the step size η of 0.5/255. ... For the adversarial patch, the value of α is set to 3.0 10 5, 2.0 10 5, and 1.0 10 5 for the Base, Mid, and Large search spaces, respectively. The value of β is set to 70. For the L2-norm perturbation, the value of c is set to 0.025, 0.02, and 0.015 for the Base, Mid, and Large search spaces, respectively. We use Adam (Kingma & Ba, 2014) as the optimiser for L2-norm perturbation and adversarial patch. The learning rate is set to 0.05, and no weight decay is used. For all perturbations, we perform the optimisation for 1 epoch on the surrogate dataset (Image Net). The batch size is set to 1024. |