reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Adversarial Vulnerability of Label-Free Test-Time Adaptation

Authors: Shahriar Rifat, Jonathan Ashdown, Michael De Lucia, Ananthram Swami, Francesco Restuccia

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on CIFAR10-C, CIFAR100-C, and Image Net-C, we demonstrate that our proposed approach closely matches the performance of state-of-the-art attack benchmarks, even without access to labeled samples. In certain cases, our approach generates stronger attacks, e.g., more than 4% higher error rate on CIFAR10-C. Source code for the experiments is available at https://github.com/Restuccia-Group/tta-adv.git.
Researcher Affiliation	Collaboration	Shahriar Rifat , Jonathan Ashdown , Michael De Lucia , Ananthram Swami and Francesco Restuccia Northeastern University, United States DEVCOM Army Research Laboratory, United States Air Force Research Laboratory, United States
Pseudocode	Yes	Algorithm 1: FCA Algorithm
Open Source Code	Yes	Source code for the experiments is available at https://github.com/Restuccia-Group/tta-adv.git.
Open Datasets	Yes	We leverage three primary benchmark datasets typically used for TTA performance evaluation, i.e., CIFAR10-C, CIFAR100-C, and Image Net-C. We directly obtain the CIFAR10-C and CIFAR100-C test dataset from Robustbench (Croce et al., 2020). For Image Net-C, we use the provided data by (Hendrycks & Dietterich, 2019).
Dataset Splits	Yes	Unless otherwise specified, we use a test batch size of 200 for each trial where 20% samples are selected as compromised ones
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions models like Res Net-32 and Res Net-50, and refers to "pytorch-cifar-models" and "torchvision(resnet50-v2)", but it does not specify versions for general software dependencies like Python, PyTorch, or CUDA, which are needed for replication.
Experiment Setup	Yes	Unless otherwise specified, we use a test batch size of 200 for each trial where 20% samples are selected as compromised ones, adversarial learning rate α = 2/255, perturbation constraint ϵ = 8/255 and iteration steps for attack to be 100.