OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection
Authors: Jingyang Zhang, Jingkang Yang, Pengyun Wang, Haoqi Wang, Yueqian Lin, Haoran Zhang, Yiyou Sun, Xuefeng Du, Yixuan Li, Ziwei Liu, Yiran Chen, Hai Li
DMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Open OOD v1.5 extends its evaluation capabilities to large-scale data sets (Image Net) and foundation models (e.g., CLIP and DINOv2), and expands its scope to investigate full-spectrum OOD detection which considers semantic and covariate distribution shifts at the same time. This work also contributes in-depth analysis and insights derived from comprehensive experimental results, thereby enriching the knowledge pool of OOD detection methodologies. With these enhancements, Open OOD v1.5 aims to drive advancements and offer a more robust and comprehensive evaluation benchmark for OOD detection research. |
| Researcher Affiliation | Academia | Jingyang Zhang Duke University Jingkang Yang S-Lab, Nanyang Technological University Pengyun Wang The Australian National University Haoqi Wang EPFL Yueqian Lin Duke University Haoran Zhang Duke University Yiyou Sun University of Wisconsin-Madison Xuefeng Du University of Wisconsin-Madison Yixuan Li University of Wisconsin-Madison Ziwei Liu S-Lab, Nanyang Technological University Yiran Chen Duke University Hai Li Duke University |
| Pseudocode | No | The paper describes various methods in Appendix C but does not provide any structured pseudocode or algorithm blocks in the main text or appendices. |
| Open Source Code | Yes | Code: https: // github. com/ Jingkang50/ Open OOD/ Leaderboard: https: // zjysteven. github. io/ Open OOD/ ... We spent great efforts in maximizing reproducibility. Specifically, all training runs can be easily reproduced by running Open OOD with configuration files.4 We refer to our online code repo for details, which thoroughly documents all bash training scripts.5 |
| Open Datasets | Yes | In addition to the small data sets included in v1, Open OOD v1.5 provide the most extensive experiment results for nearly 40 methods (and their combinations) on Image Net-1K, which serve as a comprehensive reference for later works. ... CIFAR-10. The first benchmark considers CIFAR-10 (Krizhevsky et al., 2009a) as ID. ... All data sets used in our work are either existing public data sets or subsets that we curate from existing ones. |
| Dataset Splits | Yes | CIFAR-10. The first benchmark considers CIFAR-10 (Krizhevsky et al., 2009a) as ID. We use the official train set with 50,000 samples as Dtrain ID and hold out 1,000 samples from the test set to form Dval ID , while the remaining 9,000 test samples are taken as Dtest ID . ... Image Net-1K. We use 45,000 images from the Image Net validation set (Deng et al., 2009) as Dtest ID , while the remaining 5,000 images serve as Dval ID . |
| Hardware Specification | Yes | CIFAR and Image Net models are trained using 1 and 2 Quadro RTX 6000 GPUs (24GB memory), respectively. ... The inference time is profiled with a single 24GB GPU, and we report the average results over 5 runs. |
| Software Dependencies | No | The paper mentions 'torchvision' and 'python' implicitly through code snippets or descriptions but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Training. For CIFAR-10/100 and Image Net-200, we train a Res Net-18 (He et al., 2016) for 100 epochs. We consider the standard cross-entropy training for post-hoc methods. The optimizer is SGD with a momentum of 0.9. We use a learning rate of 0.1 with cosine annealing decay schedule (Loshchilov and Hutter, 2016). A weight decay of 0.0005 is applied. The batch size is 128 for CIFAR-10/100 and 256 for Image Net-200. |