Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers

Authors: Ruofei Wang, Hongzhan Lin, Ziyuan Luo, Ka Chun Cheung, Simon See, Jing Ma, Renjie Wan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on three public datasets demonstrate the effectiveness and stealthiness of our CMT.
Researcher Affiliation Collaboration Ruofei Wang1,2, Hongzhan Lin1, Ziyuan Luo1,2, Ka Chun Cheung2, Simon See2, Jing Ma1, Renjie Wan1* 1Department of Computer Science, Hong Kong Baptist University 2NVIDIA AI Technology Center, NVIDIA
Pseudocode Yes Details about this function are shown in lines 2 to 11 of Algorithm 1 in our Supplementary Materials.
Open Source Code No The paper does not explicitly state that the authors' own implementation code for the methodology is openly available or provide a direct link to it. It only mentions using a third-party MMF benchmark: "We use the MMF benchmark (Singh et al. 2020) with default settings (e.g., iterations, cross-entropy loss function, etc) to conduct our comparison experiments." and "MMF: A multimodal framework for vision and language research. https://github.com/facebookresearch/mmf."
Open Datasets Yes We consider three widely used hateful meme detection datasets: FBHM (Kiela et al. 2020), MAMI (Fersini et al. 2022), and Harmeme (Pramanick et al. 2021) in our experiments.
Dataset Splits Yes The details of each dataset are shown in Table 1. ... Train/Dev/Test 8500/500/1000 8000/1000/1000 3013/177/354
Hardware Specification No The paper does not provide specific hardware details. It only mentions training settings for a model: "Res Net-152 (He et al. 2016) is chosen and trained for 100 epochs with a learning rate of 0.001, using an SGD optimizer."
Software Dependencies No The paper mentions using "MMF benchmark (Singh et al. 2020)" but does not provide specific version numbers for any software, libraries, or frameworks used in their implementation.
Experiment Setup Yes We randomly sample clean data from the training set to inject triggers according to the poison ratio: ρ = 1%. We set the trigger scaling parameter ϵ = 1/8. For CMT, the blending parameter is λ = 0.2. ... Res Net-152 (He et al. 2016) is chosen and trained for 100 epochs with a learning rate of 0.001, using an SGD optimizer.