Mahalanobis++: Improving OOD Detection via Feature Normalization
Authors: Maximilian Müller, Matthias Hein
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 44 models across diverse architectures and pretraining schemes show that ℓ2-normalization improves the conventional Mahalanobis distance-based approaches significantly and consistently, and outperforms other recently proposed OOD detection methods. Code is available at github.com/mueller-mp/maha-norm. (...) 5. Experiments Image Net. Our main goal is to investigate the effectiveness of Mahalanobis++ across a large pool of architectures, model sizes and training schemes for Image Net-scale OOD detection, as this is where the conventional Mahalanobis distance showed the most varied results in previous studies (...) We report the false positive rate at a true positive rate of 95% (FPR) as the OOD detection metric and refer to the appendix for other metrics, such as AUC, details on the model checkpoints, baseline methods, and extended results. |
| Researcher Affiliation | Academia | 1University of T ubingen and T ubingen AI Center. Correspondence to: Maximilian M uller <EMAIL>. |
| Pseudocode | No | The paper describes the methodology and evaluation steps using mathematical formulations and textual descriptions (e.g., Section 3.1 Mahalanobis Distance, equations 1-4) but does not include a distinct 'Pseudocode' or 'Algorithm' block with structured, code-like steps. |
| Open Source Code | Yes | Code is available at github.com/mueller-mp/maha-norm. |
| Open Datasets | Yes | Extensive experiments on 44 models across diverse architectures and pretraining schemes show that ℓ2-normalization improves the conventional Mahalanobis distance-based approaches significantly and consistently, and outperforms other recently proposed OOD detection methods. Code is available at github.com/mueller-mp/maha-norm. (...) Following the Open OOD setup (Yang et al., 2022), we report results on Ninco (Bitterwolf et al., 2023), i Naturalist (Van Horn et al., 2018), SSB-hard (Vaze et al., 2022), Open Images-O (Krasin et al., 2017) and Texture (Cimpoi et al., 2014). (...) We investigate Mahalanobis++ on CIFAR100 (Krizhevsky, 2009), following the Open OOD setup with tiny Image Net (Le & Yang, 2015), Mnist (Le Cun et al., 1998), SVHN (Netzer et al., 2011), Texture (Cimpoi et al., 2014), Places (Zhou et al., 2017) and Cifar10 as OOD datasets for a range of architectures and training schemes. |
| Dataset Splits | Yes | Following the Open OOD setup (Yang et al., 2022), we report results on Ninco (Bitterwolf et al., 2023), i Naturalist (Van Horn et al., 2018), SSB-hard (Vaze et al., 2022), Open Images-O (Krasin et al., 2017) and Texture (Cimpoi et al., 2014). (...) If s Maha(xt) T then the sample is rejected as OOD, where for evaluation purposes T is typically determined by fixing a TPR of 95% on the in-distribution. (...) Given the training set (xi, yi)n i=1 with input xi and class labels yi (...) OOD test samples (i.e. samples that were not used for estimating means and covariance). |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU, CPU models, or memory specifications) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions using 'publicly available model checkpoints from timm (Wightman, 2019) and huggingface.co' but does not specify version numbers for these or any other software dependencies (e.g., Python, PyTorch, CUDA) required for reproducibility. |
| Experiment Setup | Yes | If s Maha(xt) T then the sample is rejected as OOD, where for evaluation purposes T is typically determined by fixing a TPR of 95% on the in-distribution. (...) Like suggested in (Sun et al., 2022), we use K = 1000. (...) As suggested in (Wang et al., 2022), we set the threshold r such that 1% of the activations from the train set would be truncated. (...) Like suggested by the authors, we use 1% of the train features and K = 10 neighbors for Image Net experiments. (...) Like suggested in (Wang et al., 2022), we use D = 1000 if the dimensionality of the feature space d is d 2048, D = 512 if 2048 d 768, and D = d/2 rounded to integers otherwise. |