reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Safeguarding a Classifier from OOD and Adversarial Samples

Authors: Nicolas Atienza, Johanne Cohen, Christophe Labreuche, Michele Sebag

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical validation of the approach is conducted on various neural architectures (Res Net, VGG, and Vision Transformer) and considers medium and large-sized datasets (CIFAR-10, CIFAR-100, and Image Net). The results show the stability and frugality of the GEV model and demonstrate SPADE s efficiency compared to the state-of-the-art methods.1
Researcher Affiliation	Collaboration	Nicolas Atienza Thales cort AIx Labs Industrial AI Laboratory SINCLAIR LISN CNRS-INRIA Paris-Saclay University EMAIL Christophe Labreuche Thales cort AIx Labs Industrial AI Laboratory SINCLAIR Palaiseau, France EMAIL Johanne Cohen LISN CNRS-INRIA Paris-Saclay University Saclay, France EMAIL Mich ele Sebag LISN CNRS-INRIA Paris-Saclay University Saclay, France EMAIL
Pseudocode	Yes	Overall, the SPADE abstaining classifier relies on Def. 5, where the GEV models are learned using Alg. 1. Algorithm 1 SPADE. Learning EVT models
Open Source Code	Yes	1The code is publicly available at https://github.com/natixx14/SPADE
Open Datasets	Yes	Three mediumand large-sized datasets are considered: CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and Image Net-1K (using the ILSVRC2012 version). For CIFAR-10, near-OOD samples are from CIFAR-100, while far-OOD samples originate from MNIST, SVNH, Texture, and Places365. For CIFAR-100, near-OOD samples come from Image Net-1K and Tiny-Image Net, whereas far-OOD samples are taken from MNIST, SVNH, Texture and Places365. For Image Net-1K, near-OOD samples are from SSB-Hard (Vaze et al., 2022) and NINCO (Bitterwolf et al., 2023), while far-OOD samples come from i Naturalist (Horn et al., 2018), Texture (Cimpoi et al., 2014), and Open Image-O (Wang et al., 2022) datasets.
Dataset Splits	No	The paper mentions using well-known datasets like CIFAR-10, CIFAR-100, and Image Net, and sourcing OOD samples from other datasets. While it implies using standard splits by fine-tuning pre-trained models, it does not explicitly state the specific training, validation, or test split percentages or sample counts for the main datasets used in their experiments.
Hardware Specification	Yes	All experiments are conducted on Tesla A100 80GB GPUs.
Software Dependencies	No	All experiments are implemented using Py Torch (Paszke et al., 2019) and the Open OOD package (Zhang et al., 2023)5. While the software names are mentioned, specific version numbers for PyTorch and the Open OOD package are not provided.
Experiment Setup	Yes	All teacher models are trained using stochastic gradient backpropagation, with the learning rate adjusted by the Adam optimizer Kingma & Ba (2015). The models, initially pretrained on Image Net, are sourced from the Py Torch hub and fine-tuned (if necessary) on the considered datasets (CIFAR-10, CIFAR-100). This fine-tuning process consists of adding a new fully connected head layer with Re LU activation functions, placed on top of the frozen latent space. The head is trained with a dropout rate of 0.5.