TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning
Authors: Yupei Liu, Yanting Wang, Jinyuan Jia
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluation shows that Trojan Dec can effectively identify the trojan (if any) from a given test input and recover it under state-of-the-art trojan attacks. We further demonstrate by experiments that our Trojan Dec outperforms the state-of-the-art defenses. In this work, we propose Trojan Dec, the first framework to identify and restore a trojaned test image in the self-supervised learning context. To show the effectiveness of our Trojan Dec, we conduct extensive experiments on multiple pre-training datasets and downstream tasks under state-of-the-art trojan attacks (Jia, Liu, and Gong 2022; Saha et al. 2022; Liu, Jia, and Gong 2022; Zhang et al. 2024) to self-supervised learning. Our results some that our defense is consistently effective under those attacks. We further generalize several representative backdoor attacks (Chen et al. 2017; Salem et al. 2022) designed for supervised learning to the self-supervised learning context as the adaptive attacks and show the effectiveness of our method in defending against them. To study the impact of hyperameters, we perform a comprehensive ablation study in our evaluation. |
| Researcher Affiliation | Academia | Yupei Liu, Yanting Wang, Jinyuan Jia The Pennsylvania State University EMAIL |
| Pseudocode | Yes | Algorithm 1: Metadata Extraction |
| Open Source Code | No | The paper mentions adopting publicly available implementations of other works for attacks (e.g., "We adopt the publicly available implementation of (Saha et al. 2022; Jia, Liu, and Gong 2022)") and using a pre-trained diffusion model (Wang 2022), but it does not provide an explicit statement or a direct link to the source code for the methodology of Trojan Dec itself. |
| Open Datasets | Yes | Dataset and models: We consider CIFAR10 (Krizhevsky 2009) and STL10 (Coates, Ng, and Lee 2011) as the datasets to pre-train self-supervised learning encoder. When CIFAR10 is used as the pre-training dataset, we use STL10, SVHN (Netzer et al. 2011), and Euro SAT (Helber et al. 2019) as the downstream datasets. When STL10 is applied to pre-trained the encoder, we use CIFAR10, SVHN, and Euro SAT as the downstream dataset. We resize all images to be 32 32 to be consistent. The details of these datasets can be found in Table 5 in our technical report (Liu, Wang, and Jia 2024). To further evaluate the effectiveness of Trojan Dec, we apply it on 2 real-world self-supervised learning encoders: 1) the encoder pre-trained on Image Net released by Google (Google 2020) and 2) CLIP encoder pre-trained on 400 million image-text pairs released by Open AI (Open AI 2021). |
| Dataset Splits | No | The paper mentions using various datasets for pre-training and downstream tasks and refers to applying their method to "training and testing data of the downstream classifier". However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for splitting these datasets into training, validation, or testing sets within the main text. It defers to a technical report for "details of these datasets", but this information is not directly in the main paper. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions using a "pre-trained diffusion model" and adopting "publicly available implementation" of other attacks, but it does not explicitly list specific software dependencies (e.g., programming language versions, library names with version numbers like Python 3.8, PyTorch 1.9) used for the implementation of Trojan Dec itself. |
| Experiment Setup | Yes | Our Trojan Dec has two parameters: k (mask size) and s (step size). By default, we set k and s to 15 and 1, respectively. We will study the impact of each of them in the ablation study. |