Exploring Simple, High Quality Out-of-Distribution Detection with L2 Normalization
Authors: Jarrod Haas, William Yolland, Bernhard T Rabus
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results in Table 1 demonstrate that L2 normalization over feature space during training produces results that compare well with state-of-the-art methods. Notably, L2 normalization results not only in large performance gains over non-normalized baselines, but these gains happen much faster (Table 4). Training details for all experiments can be found in A.1. |
| Researcher Affiliation | Collaboration | Jarrod Haas EMAIL SARlab, Department of Engineering Science Simon Fraser University; William Yolland EMAIL Meta Optima; Bernhard Rabus EMAIL SARlab, Department of Engineering Science Simon Fraser University |
| Pseudocode | Yes | Algorithm 1: L2 Normalization of Features def forward(self, x): z = self.encoder(x) featurenorm = torch.norm(z).detach().clone() z = torch.Functional.normalize(z, p=2, dim=1) y = self.fc(z) return y, featurenorm; Figure 1: A Pytorch code snippet illustrating the proposed method |
| Open Source Code | Yes | The Compact Convolutional Transformers were trained from five random initializations with cosine annealing for 300 epochs, in distributed mode parallel with batch sizes of 128. The code for these models and the training regime can be found at https://github.com/SHI-Labs/Compact-Transformers. |
| Open Datasets | Yes | AUROC scores for baselines trained on CIFAR10 and tested on far Oo D (SVHN) and near Oo D (CIFAR100) data sets... We study cases where feature norms, as a direct measure of input familiarity, can become more useful with L2 normalization. We show that this is at least the case for several architectures trained on CIFAR10, CIFAR100 and Tiny Image Net datasets. ...trained on the German Traffic Sign Recognition Benchmark (GTSRB) |
| Dataset Splits | Yes | To evaluate models, we merge ID and Oo D images into a single test set. Oo D performance is then a binary classification task, where we measure how well Oo D images can be separated from ID images using a score derived from our model. Our score in this case is the L2 norm of each image s unnormalized feature vector z, which is input to AUROC and FPR95 scoring functions (see Figure 1). ...Plots of feature norms vs softmax scores for all CIFAR10 (blue) and SVHN (orange) test images. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) are provided in the paper. The text only mentions 'wherever permitted by GPU RAM'. |
| Software Dependencies | No | The paper mentions 'Pytorch code snippet', 'torch.Functional.normalize', 'SGD optimizer', 'AdamW optimizer', and 'PyTorch implementation' for ConvNeXt, but no specific version numbers for any of these software components are provided. |
| Experiment Setup | Yes | Training employed an SGD optimizer initialized to a learning rate of 1e-1 with gamma=0.1, and with stepdowns at 40 and 50 epochs for 60 epoch models, 75 and 90 epochs for 100 epoch models, and at 150 and 250 epochs for 350 epoch models. All Res Net models use spectral normalization, global average pooling, and Leaky Re LUs... A batch size of 1024 was used wherever permitted by GPU RAM, but Logit Norm models were trained with a batch size of 128 as per the original paper s recommendations (Wei et al., 2022). Res Net50 used a batch size of 768 for CIFAR10 and 512 for Tiny Image Net. The Compact Convolutional Transformers were trained from five random initializations with cosine annealing for 300 epochs, in distributed mode parallel with batch sizes of 128... It was trained using a single cosine annealing schedule with the Adam W optimizer. |