reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

Authors: Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Open OOD v1.5 extends its evaluation capabilities to large-scale data sets (Image Net) and foundation models (e.g., CLIP and DINOv2), and expands its scope to investigate full-spectrum OOD detection which considers semantic and covariate distribution shifts at the same time. This work also contributes in-depth analysis and insights derived from comprehensive experimental results, thereby enriching the knowledge pool of OOD detection methodologies. With these enhancements, Open OOD v1.5 aims to drive advancements and offer a more robust and comprehensive evaluation benchmark for OOD detection research.
Researcher Affiliation	Academia	Jingyang Zhang Duke University Jingkang Yang S-Lab, Nanyang Technological University Pengyun Wang The Australian National University Haoqi Wang EPFL Yueqian Lin Duke University Haoran Zhang Duke University Yiyou Sun University of Wisconsin-Madison Xuefeng Du University of Wisconsin-Madison Yixuan Li University of Wisconsin-Madison Ziwei Liu S-Lab, Nanyang Technological University Yiran Chen Duke University Hai Li Duke University
Pseudocode	No	The paper describes various methods in Appendix C but does not provide any structured pseudocode or algorithm blocks in the main text or appendices.
Open Source Code	Yes	Code: https: // github. com/ Jingkang50/ Open OOD/ Leaderboard: https: // zjysteven. github. io/ Open OOD/ ... We spent great efforts in maximizing reproducibility. Specifically, all training runs can be easily reproduced by running Open OOD with configuration files.4 We refer to our online code repo for details, which thoroughly documents all bash training scripts.5
Open Datasets	Yes	In addition to the small data sets included in v1, Open OOD v1.5 provide the most extensive experiment results for nearly 40 methods (and their combinations) on Image Net-1K, which serve as a comprehensive reference for later works. ... CIFAR-10. The first benchmark considers CIFAR-10 (Krizhevsky et al., 2009a) as ID. ... All data sets used in our work are either existing public data sets or subsets that we curate from existing ones.
Dataset Splits	Yes	CIFAR-10. The first benchmark considers CIFAR-10 (Krizhevsky et al., 2009a) as ID. We use the official train set with 50,000 samples as Dtrain ID and hold out 1,000 samples from the test set to form Dval ID , while the remaining 9,000 test samples are taken as Dtest ID . ... Image Net-1K. We use 45,000 images from the Image Net validation set (Deng et al., 2009) as Dtest ID , while the remaining 5,000 images serve as Dval ID .
Hardware Specification	Yes	CIFAR and Image Net models are trained using 1 and 2 Quadro RTX 6000 GPUs (24GB memory), respectively. ... The inference time is profiled with a single 24GB GPU, and we report the average results over 5 runs.
Software Dependencies	No	The paper mentions 'torchvision' and 'python' implicitly through code snippets or descriptions but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	Training. For CIFAR-10/100 and Image Net-200, we train a Res Net-18 (He et al., 2016) for 100 epochs. We consider the standard cross-entropy training for post-hoc methods. The optimizer is SGD with a momentum of 0.9. We use a learning rate of 0.1 with cosine annealing decay schedule (Loshchilov and Hutter, 2016). A weight decay of 0.0005 is applied. The batch size is 128 for CIFAR-10/100 and 256 for Image Net-200.