reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Authors: Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we ﬁrst demonstrate that the ability of a classiﬁer to make the none-of-above decision is highly correlated with its accuracy on the closed-set classes. We ﬁnd that this relationship holds across loss objectives and architectures, and further demonstrate the trend both on the standard OSR benchmarks as well as on a large-scale Image Net evaluation. Second, we use this correlation to boost the performance of the maximum softmax probability OSR baseline by improving its closed-set accuracy, and with this strong baseline achieve state-of-the-art on a number of OSR benchmarks.
Researcher Affiliation	Academia	Sagar Vaze Kai Han Andrea Vedaldi Andrew Zisserman Visual Geometry Group, University of Oxford The University of Hong Kong EMAIL EMAIL
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code available at: https://github.com/sgvaze/osr_closed_set_all_you_need.
Open Datasets	Yes	MNIST (Le Cun et al., 2010), SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky, 2009): These are ten-class datasets... Tiny Image Net (Le & Yang, 2015)... Image Net-21K-P (Ridnik et al., 2021)... Caltech-UCSD Birds (CUB) (Wah et al., 2011), Stanford Cars (Krause et al., 2013) FGVC-Aircraft (Maji et al., 2013).
Dataset Splits	Yes	In all cases, the model is trained on a subset of classes, while other classes are reserved as unseen for evaluation. MNIST (Le Cun et al., 2010), SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky, 2009): ...training on six classes, while using the other four classes for testing (\|C\| = 6; \|U\| = 4). CIFAR + N ... training on four classes from CIFAR10, while using N classes from CIFAR100 for evaluation, where N denotes either 10 or 50 classes (\|C\| = 4; \|U\| {10, 50}). Tiny Image Net ... 20 classes used for training and 180 as unknown (\|C\| = 20; \|U\| = 180). ... For both Easy and Hard splits, we have \|C\| = 1000 and \|U\| = 1000. ... We select the Rand Augment and label smoothing hyper-parameters by maximizing closed-set accuracy on a validation set (randomly sampling 20% of the training set).
Hardware Specification	Yes	We train all models for 600 epochs with a batch size of 128, training models on a single NVIDIA Titan X GPU. ... the memory intensive nature of the method meant we could only ﬁt a batch size of 2 on a 12GB GPU. We attempted to scale it up for the FGVC datasets, ﬁtting a batch size of 16 across 4 24GB GPUs, with training taking a week.
Software Dependencies	No	No specific version numbers for key software components (e.g., Python, PyTorch, CUDA libraries) were found. The paper mentions using a 'Res Net50 model pre-trained with the cross-entropy loss on Image Net-1K from (Wightman, 2019)' which points to 'Pytorch image models' but does not specify their own direct dependencies with versions.
Experiment Setup	Yes	We train all models for 600 epochs with a batch size of 128. ... We use an initial learning rate of 0.1 for all datasets except Tiny Image Net, for which we use 0.01. We train with a cosine annealed learning rate, restarting the learning rate to the initial value at epochs 200 and 400. Furthermore, we warm up the learning rate by linearly increasing it from 0 to the initial value at epoch 20. ... We use Rand Augment for all experiments... We follow a similar procedure for the label smoothing value s, though we ﬁnd the optimal value to be s = 0 for all datasets except Tiny Image Net, where it helps signiﬁcantly at s = 0.9.