Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers
Authors: Ruofei Wang, Hongzhan Lin, Ziyuan Luo, Ka Chun Cheung, Simon See, Jing Ma, Renjie Wan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on three public datasets demonstrate the effectiveness and stealthiness of our CMT. |
| Researcher Affiliation | Collaboration | Ruofei Wang1,2, Hongzhan Lin1, Ziyuan Luo1,2, Ka Chun Cheung2, Simon See2, Jing Ma1, Renjie Wan1* 1Department of Computer Science, Hong Kong Baptist University 2NVIDIA AI Technology Center, NVIDIA |
| Pseudocode | Yes | Details about this function are shown in lines 2 to 11 of Algorithm 1 in our Supplementary Materials. |
| Open Source Code | No | The paper does not explicitly state that the authors' own implementation code for the methodology is openly available or provide a direct link to it. It only mentions using a third-party MMF benchmark: "We use the MMF benchmark (Singh et al. 2020) with default settings (e.g., iterations, cross-entropy loss function, etc) to conduct our comparison experiments." and "MMF: A multimodal framework for vision and language research. https://github.com/facebookresearch/mmf." |
| Open Datasets | Yes | We consider three widely used hateful meme detection datasets: FBHM (Kiela et al. 2020), MAMI (Fersini et al. 2022), and Harmeme (Pramanick et al. 2021) in our experiments. |
| Dataset Splits | Yes | The details of each dataset are shown in Table 1. ... Train/Dev/Test 8500/500/1000 8000/1000/1000 3013/177/354 |
| Hardware Specification | No | The paper does not provide specific hardware details. It only mentions training settings for a model: "Res Net-152 (He et al. 2016) is chosen and trained for 100 epochs with a learning rate of 0.001, using an SGD optimizer." |
| Software Dependencies | No | The paper mentions using "MMF benchmark (Singh et al. 2020)" but does not provide specific version numbers for any software, libraries, or frameworks used in their implementation. |
| Experiment Setup | Yes | We randomly sample clean data from the training set to inject triggers according to the poison ratio: ρ = 1%. We set the trigger scaling parameter ϵ = 1/8. For CMT, the blending parameter is λ = 0.2. ... Res Net-152 (He et al. 2016) is chosen and trained for 100 epochs with a learning rate of 0.001, using an SGD optimizer. |