reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Out-of-Distribution Detection with Prototypical Outlier Proxy

Authors: Mingrong Gong, Chaoqi Chen, Qingqiang Sun, Yue Wang, Hui Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across various benchmarks demonstrate the effectiveness of POP. Notably, POP achieves average FPR95 reductions of 7.70%, 6.30%, and 5.42% over the second-best methods on CIFAR-10, CIFAR100, and Image Net-200, respectively. Moreover, compared to the recent method NPOS, which relies on outlier synthesis, POP trains 7.2 times faster and performs inference 19.5 times faster.
Researcher Affiliation	Academia	1College of Computer Science and Software Engineering, Shenzhen University 2School of Engineering, Great Bay University 3Department of Computer Science, University College London EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: The algorithm of POP
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Datasets. For comprehensive experiments, we adopt the Open OOD benchmark (Yang et al. 2022a; Zhang et al. 2023c), which provides an accurate, standardized, and unified evaluation for fair testing. We include small-scale datasets CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR100 (Krizhevsky, Hinton et al. 2009), and the large-scale Image Net-200, which is a subset of Image Net-1k (Deng et al. 2009) with the first 200 classes, as our ID datasets. Among them, (i) CIFAR-10 is a small dataset with 10 classes, including 50k training images and 10k test images. We establish OOD test dataset with CIFAR-100, Tiny Image Net (TIN) (Torralba, Fergus, and Freeman 2008), MNIST (Xiao, Rasul, and Vollgraf 2017) (including Fashion MNIST (Deng 2012)), Texture(Cimpoi et al. 2014), and Places365 (Zhou et al. 2016). (ii) CIFAR-100, another small dataset, consists of 50k training images and 10k test images, with 100 classes. The OOD test dataset includes CIFAR-10, with the remaining datasets configured identically to those in (i). (iii) For the large-scale dataset Image Net-200, the OOD test dataset consist of SSB (Vaze et al. 2022) NINCO (Bitterwolf, M uller, and Hein 2023), i Natruelist (Van Horn et al. 2018), Place365, and Open Image-O (Wang et al. 2022).
Dataset Splits	Yes	(i) CIFAR-10 is a small dataset with 10 classes, including 50k training images and 10k test images. (ii) CIFAR-100, another small dataset, consists of 50k training images and 10k test images, with 100 classes.
Hardware Specification	Yes	Training details. We train a Res Net-18 model (He et al. 2016) from scratch for 100 epochs on CIFAR-10 and CIFAR100, and 90 epochs on Image Net-200, using a single Nvidia 4090.
Software Dependencies	No	The paper mentions models like ResNet-18 and optimizers like SGD, but does not specify any software versions for libraries (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup	Yes	Training details. We train a Res Net-18 model (He et al. 2016) from scratch for 100 epochs on CIFAR-10 and CIFAR100, and 90 epochs on Image Net-200, using a single Nvidia 4090. Training is performed with the SGD optimizer, a learning rate of 0.1, momentum of 0.9, and weight decay of 0.0005.