reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TransFool: An Adversarial Attack against Neural Machine Translation Models

Authors: Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that Trans Fool is transferable to unknown target models. Finally, based on automatic and human evaluations, Trans Fool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings.
Researcher Affiliation	Collaboration	Sahar Sadrizadeh EMAIL EPFL, Lausanne, Switzerland Ljiljana Dolamic EMAIL Armasuisse S+T, Thun, Switzerland Pascal Frossard EMAIL EPFL, Lausanne, Switzerland
Pseudocode	Yes	Algorithm 1 Trans Fool Adversarial Attack
Open Source Code	Yes	Our source code is available at https://github.com/sssadrizadeh/TransFool. Appendix G also contains the license information and details of the assets (datasets, codes, and models).
Open Datasets	Yes	We conduct experiments on the English-French (En-Fr), English-German (En-De), and English-Chinese (En-Zh) translation tasks. We use the test set of WMT14 (Bojar et al., 2014) for En-Fr and En-De tasks, and the test set of OPUS-100 (Zhang et al., 2020) for En-Zh task. Some statistics of these datasets are presented in Appendix A. As explained in Section 4, the similarity constraint and the LM loss of the proposed optimization problem require an FC layer and a CLM. To this aim, for each NMT model, we train an FC layer and a CLM (with GPT-2 structure (Radford et al., 2019)) on Wiki Text-103 dataset.
Dataset Splits	Yes	We conduct experiments on the English-French (En-Fr), English-German (En-De), and English-Chinese (En-Zh) translation tasks. We use the test set of WMT14 (Bojar et al., 2014) for En-Fr and En-De tasks, and the test set of OPUS-100 (Zhang et al., 2020) for En-Zh task. Some statistics of these datasets are presented in Appendix A.
Hardware Specification	Yes	For the Marian NMT (En-Fr) model, on a system equipped with an NVIDIA A100 GPU, it takes 26.45 seconds to generate adversarial examples by Trans Fool. On the same system, k NN needs 1.45 seconds, and Seq2Sick needs 38.85 seconds to generate adversarial examples for less effective adversarial attacks, however.
Software Dependencies	No	We used the models and datasets that are available in Hugging Face transformers (Wolf et al., 2020) and datasets (Lhoest et al., 2021) libraries. Moreover, we used PyTorch for all experiments (Paszke et al., 2019), which is released under the BSD license. Specific version numbers for the Hugging Face libraries are not explicitly provided.
Experiment Setup	Yes	To find the minimizer of our optimization problem (1), we use the Adam optimizer (Kingma & Ba, 2014) with step size γ = 0.016. Moreover, we set the maximum number of iterations to 500. Our algorithm has three parameters: coefficients α and β in the optimization function (1), and the relative BLEU score ratio λ in the stopping criteria (7). We set λ = 0.4, β = 1.8, and α = 20.