reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Protecting against simultaneous data poisoning attacks

Authors: Neel Alex, Muhammad Shoaib Ahmed Siddiqui, Amartya Sanyal, David Krueger

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that multiple backdoors can be simultaneously installed... Furthermore, we show that existing backdoor defense methods do not effectively defend... Finally, we leverage insights... to develop a new defense, Ba DLoss (Backdoor Detection via Loss Dynamics), that is effective in the multi-attack setting. With minimal clean accuracy degradation, Ba DLoss attains an average attack success rate in the multi-attack setting of 7.98% on CIFAR10, 10.29% on GTSRB, and 19.17% on Imagenette, compared to the average of other defenses at 63.44%, 74.83%, and 41.74% respectively. Ba DLoss scales to Image Net-1k, reducing the average attack success rate from 88.57% to 15.61%.
Researcher Affiliation	Academia	Neel Alex University of Cambridge Shoaib Ahmed Siddiqui University of Cambridge Amartya Sanyal Department of Computer Science, University of Copenhagen David Krueger Mila, University of Montreal
Pseudocode	Yes	A BADLOSS PSEUDOCODE Algorithm 1: Py Torch pseudocode for Ba DLoss.
Open Source Code	Yes	We open-source our code to aid replication and further study, available on Git Hub: https://github.com/shoaibahmed/badloss/
Open Datasets	Yes	We use the standard computer vision datasets: CIFAR-10 (Krizhevsky, 2009), GTSRB (Houben et al., 2013), and Imagenette (Howard, 2019)... Image Net-1k (Deng et al., 2009).
Dataset Splits	Yes	Additionally, we assume that the defender has access to a small set of guaranteed clean examples (250 examples in our case)... Attack success rate (ASR) is evaluated on the full test set excluding the target class... The overall fraction of the dataset which is poisoned is approximately 8% on CIFAR-10, 10% on GTSRB, and 9% on Imagenette.
Hardware Specification	No	The paper mentions the use of Res Net-50 and Res Net-18 architectures but does not specify any hardware details like GPU models, CPU models, or memory specifications used for the experiments.
Software Dependencies	No	We train using Py Torch (Ansel et al., 2024). Our nearest neighbors classifier uses scikit-learn (Pedregosa et al., 2011). Plots were generated with Matplotlib (Hunter, 2007). While software names are mentioned, specific version numbers for their usage are not provided.
Experiment Setup	Yes	We use the Adam W optimizer with learning rate γ = 1e 3 (with a cosine-annealing learning rate schedule), weight decay λ = 1e 4, and β1, β2 = 0.9, 0.999. We train for 100 epochs on CIFAR-10 and GTSRB, and 250 epochs on Imagenette... In CIFAR-10, we use a batch size of 128. In GTSRB, Imagenette, and Image Net, we use a batch size of 256. In CIFAR-10, we use a crop-and-pad (4px max) and random horizontal flip augmentation... In GTSRB, we use no augmentations. In Imagenette and Image Net, we use random resized crop (scale=0.08-1) and random flip augmentations.