TransFool: An Adversarial Attack against Neural Machine Translation Models

Authors: Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that Trans Fool is transferable to unknown target models. Finally, based on automatic and human evaluations, Trans Fool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings.
Researcher Affiliation Collaboration Sahar Sadrizadeh EMAIL EPFL, Lausanne, Switzerland Ljiljana Dolamic EMAIL Armasuisse S+T, Thun, Switzerland Pascal Frossard EMAIL EPFL, Lausanne, Switzerland
Pseudocode Yes Algorithm 1 Trans Fool Adversarial Attack
Open Source Code Yes Our source code is available at https://github.com/sssadrizadeh/TransFool. Appendix G also contains the license information and details of the assets (datasets, codes, and models).
Open Datasets Yes We conduct experiments on the English-French (En-Fr), English-German (En-De), and English-Chinese (En-Zh) translation tasks. We use the test set of WMT14 (Bojar et al., 2014) for En-Fr and En-De tasks, and the test set of OPUS-100 (Zhang et al., 2020) for En-Zh task. Some statistics of these datasets are presented in Appendix A. As explained in Section 4, the similarity constraint and the LM loss of the proposed optimization problem require an FC layer and a CLM. To this aim, for each NMT model, we train an FC layer and a CLM (with GPT-2 structure (Radford et al., 2019)) on Wiki Text-103 dataset.
Dataset Splits Yes We conduct experiments on the English-French (En-Fr), English-German (En-De), and English-Chinese (En-Zh) translation tasks. We use the test set of WMT14 (Bojar et al., 2014) for En-Fr and En-De tasks, and the test set of OPUS-100 (Zhang et al., 2020) for En-Zh task. Some statistics of these datasets are presented in Appendix A.
Hardware Specification Yes For the Marian NMT (En-Fr) model, on a system equipped with an NVIDIA A100 GPU, it takes 26.45 seconds to generate adversarial examples by Trans Fool. On the same system, k NN needs 1.45 seconds, and Seq2Sick needs 38.85 seconds to generate adversarial examples for less effective adversarial attacks, however.
Software Dependencies No We used the models and datasets that are available in Hugging Face transformers (Wolf et al., 2020) and datasets (Lhoest et al., 2021) libraries. Moreover, we used PyTorch for all experiments (Paszke et al., 2019), which is released under the BSD license. Specific version numbers for the Hugging Face libraries are not explicitly provided.
Experiment Setup Yes To find the minimizer of our optimization problem (1), we use the Adam optimizer (Kingma & Ba, 2014) with step size γ = 0.016. Moreover, we set the maximum number of iterations to 500. Our algorithm has three parameters: coefficients α and β in the optimization function (1), and the relative BLEU score ratio λ in the stopping criteria (7). We set λ = 0.4, β = 1.8, and α = 20.