Principled Out-of-Distribution Detection via Multiple Testing

Authors: Akshayaa Magesh, Venugopal V. Veeravalli, Anirban Roy, Susmit Jha

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks architectures.
Researcher Affiliation Collaboration Akshayaa Magesh EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign Champaign, IL 61820, USA Venugopal V. Veeravalli EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign Champaign, IL 61820, USA Anirban Roy EMAIL Computer Science Laboratory SRI International Menlo Park, CA 94061 Susmit Jha EMAIL Computer Science Laboratory SRI International Menlo Park, CA 94061
Pseudocode Yes Algorithm 1 BH based OOD detection test with conformal p-values Algorithm 2 Bonferroni based OOD detection test with conformal p-values
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes For CIFAR10 as the in-distribution dataset, we study SVHN, LSUN, Image Net, and i SUN as OOD datasets. For SVHN as the in-distribution dataset, we study LSUN, Image Net, CIFAR10 and i SUN as OOD datasets.
Dataset Splits Yes The calibration dataset in each case is a subset of 5000 samples of the in-distribution training dataset. We use a subset of 45000 points from the training dataset (with no overlap with the calibration dataset) to calculate the class-wise empirical means and shared covariance for the Mahalanobis scores, and the minimum and maximum correlations for the Gram scores.
Hardware Specification Yes All experiments presented in this paper were run on a single NVIDIA GTX-1080Ti GPU with Py Torch.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes In our experiments, we set the temperature parameter T to 100 for all in-distribution datasets, DNN architectures and OOD datasets (as stated by Liu et al. (2020), the energy score is not sensitive to the temperature parameter).