Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Boosting the Transferability of Video Adversarial Examples via Temporal Translation
Authors: Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang2659-2667
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples. For transfer-based attack against video recognition models, it achieves a 61.56% average attack success rate on the Kinetics-400 and 48.60% on the UCF-101. |
| Researcher Affiliation | Academia | 1Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University 2Shanghai Collaborative Innovation Center on Intelligent Visual Computing EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Temporal translation (TT) attack Input: Loss function J, clean video x, ground-truth class y. Parameter: The perturbation budget ϵ, iteration number I, shift L, weight matrix W. Output: The adversarial example. 1: x0 x 2: α ϵ I 3: for i = 0 to I 1 do 4: xi+1 = clipx,ϵ(xi + α g) 5: end for 6: return x I |
| Open Source Code | Yes | Code is available at https://github.com/zhipeng-wei/TT. |
| Open Datasets | Yes | We evaluate our approach using UCF-101 (Soomro, Zamir, and Shah 2012) and Kinetics-400 datasets (Kay et al. 2017), which are widely used datasets for video recognition. |
| Dataset Splits | No | The paper mentions using 'the Kinetics-400 validation dataset' for evaluation, but it does not specify the explicit proportions (e.g., percentages or counts) of the training, validation, and test splits for the datasets (UCF-101 and Kinetics-400) used to train the models, nor does it cite a standard split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions that models were 'trained on the RGB domain'. |
| Software Dependencies | No | The paper mentions that the models used are 'implemented in https://cv.gluon.ai/model zoo/ action recognition.html', implying the use of GluonCV. However, it does not provide specific version numbers for any software dependencies, such as Python, PyTorch/MXNet, or GluonCV itself. |
| Experiment Setup | Yes | In our experiments, video recognition models with Res Net-101 as its backbone are used as whitebox models for adversarial example generation. We set the maximum perturbation as ϵ = 16 for all experiments. For the iterative attack, we set the iteration number to I = 10, and thus the step size α = 1.6. For our method, the shift length L is set as 7, the weight matrix W is generated with Gaussian function, and the adjacent shift is adopted in the temporal translation. Input clips are formed by randomly cropping out 64 consecutive frames from videos and then skipping every other frame. The spatial size of the input is 224 × 224. |