reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Variational Information Theoretic Approach to Out-of-Distribution Detection

Authors: Sudeepta Mondal, Zhuolin Jiang, Ganesh Sundaramoorthi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments We validate our theory by comparing our new shaping function to So A for OOD detection on standard benchmarks. Datasets and Model architectures. We experiment with Res Net-50(He et al., 2016), Mobile Net-v2 (Sandler et al., 2018), vision transformers Vi T-B-16 and Vi T-L16 (Dosovitskiy et al., 2021) with Image Net-1k (Russakovsky et al., 2015) as ID data, and benchmark on the OOD datasets/methods used in (Zhao et al., 2024). For the Image Net benchmark, we evaluate performance across eight OOD datasets: Species (Hendrycks et al., 2022), i Naturalist (Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2018), Open Image-O (Wang et al., 2022), Image Net O (Hendrycks et al., 2021), Texture (Cimpoi et al., 2014), and MNIST (Deng, 2012). Moreover, we also experiment with CIFAR 10 and CIFAR 100 as ID data, for which we use a Vi T-B-16 (Dosovitskiy et al., 2021) finetuned on CIFAR10/100 consistent with (Fort et al., 2021a), and a MLPMixer-Nano model trained on CIFAR10/100 from scratch. We evaluate eight OOD datasets: Tiny Image Net (Torralba et al., 2008), SVHN (Netzer et al., 2011), Texture (Cimpoi et al., 2014), Places365 (Zhou et al., 2018), LSUN-Cropped (Yu et al., 2016), LSUN-Resized (Yu et al., 2016), i SUN (Xu et al., 2015), and CIFAR100/ CIFAR10 (CIFAR 100 treated as OOD for CIFAR 10, and vice-versa). Metrics. We utilize two standard evaluation metrics, following (Sun et al., 2021; Zhao et al., 2024): FPR95 the false positive rate when the true positive rate is 95% (abbreviated as FP), and the area under the ROC curve (AU). Results. The results on the Image Net-1k benchmarks (Table 1) and the CIFAR 10/100 benchmarks (Table 2) demonstrate that our approach achieves state-of-the-art performance among comparable feature-shaping methods in the previously mentioned category of methods.
Researcher Affiliation	Industry	Sudeepta Mondal 1 Zhuolin Jiang 1 Ganesh Sundaramoorthi 1 1RTX Technology Research Center (RTRC), East Hartford, CT 06118. Correspondence to: Ganesh Sundaramoorthi <EMAIL>.
Pseudocode	Yes	Algorithm 1 1D Gaussian Random Feature Computation Input: IN/OOD Distributions p(z\|y), α, β and learning rate η Output: Converged mean µi, std σc,i for each i Initialize: µi = zi, σc,i = const for n iterations do Compute a discretization of z in its likely range: zj i (µi kσc,i, µi + kσc,i) where k 3 for zi j do Compute p( z\|z)L( zi j, zi) = p(zi\|0) h l(zi) log l( zi j) l( zi j) i p(zi\|1) h l(zi) 1 log l( zi j) + l( zi j) 1i + y {0,1} p(y)p(zi\|y) log p( zi j\|zi) p( zi j) β log p( zi j\|y) p( zi j) end for Compute µL(zi) = p( z\|z)L( zi j, zi) σ2 c,i ( zi j µi)p( zi j\|zi) zi Compute σc L(zi) = p( z\|z)L( zi j, zi) σc,i ( zi j µi)2 σ2 c,i 1 p( zi j\|zi) zi end for for zi do µi µi η µL(zi) σc,i σc,i η σc L(zi)
Open Source Code	No	The paper does not contain any explicit statement about providing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Datasets and Model architectures. We experiment with Res Net-50(He et al., 2016), Mobile Net-v2 (Sandler et al., 2018), vision transformers Vi T-B-16 and Vi T-L16 (Dosovitskiy et al., 2021) with Image Net-1k (Russakovsky et al., 2015) as ID data, and benchmark on the OOD datasets/methods used in (Zhao et al., 2024). For the Image Net benchmark, we evaluate performance across eight OOD datasets: Species (Hendrycks et al., 2022), i Naturalist (Horn et al., 2018), SUN (Xiao et al., 2010), Places (Zhou et al., 2018), Open Image-O (Wang et al., 2022), Image Net O (Hendrycks et al., 2021), Texture (Cimpoi et al., 2014), and MNIST (Deng, 2012). Moreover, we also experiment with CIFAR 10 and CIFAR 100 as ID data, for which we use a Vi T-B-16 (Dosovitskiy et al., 2021) finetuned on CIFAR10/100 consistent with (Fort et al., 2021a), and a MLPMixer-Nano model trained on CIFAR10/100 from scratch. We evaluate eight OOD datasets: Tiny Image Net (Torralba et al., 2008), SVHN (Netzer et al., 2011), Texture (Cimpoi et al., 2014), Places365 (Zhou et al., 2018), LSUN-Cropped (Yu et al., 2016), LSUN-Resized (Yu et al., 2016), i SUN (Xu et al., 2015), and CIFAR100/ CIFAR10 (CIFAR 100 treated as OOD for CIFAR 10, and vice-versa).
Dataset Splits	Yes	As in Re Act (Sun et al., 2021), for Image Net-1k benchmarks we use a validation set comprising the validation split of Image Net-1k as ID data, and Gaussian noise images as OOD data, generated by sampling from N(0, 1) for each pixel location, to tune the hyperparameters of our piecewise linear activation shaping function. For CIFAR 10/100 benchmarks, following ODIN (Liang et al., 2020), we employ a random subset of the i SUN dataset (Xu et al., 2015) as validation OOD data for our hyperparameter tuning. As ID validation data for CIFAR10/100 we use the test splits of the corresponding datasets.
Hardware Specification	Yes	The complexity for this optimization (which is done off-line in training) is O(NMK) where N is the number of samples of p(z\|y), M is the samples of p( z\|z) and K is the number of gradient descent iterations. On a single A100 GPU, this took less than a minute. The inference cost of our feature shaping method is on the order of microseconds for a 256 256 3 image, using Py Torch on an NVIDIA A10080GB GPU.
Software Dependencies	No	The paper mentions "Py Torch" but does not specify a version number. It also implicitly uses CUDA due to the NVIDIA A100 GPU but no version is specified.
Experiment Setup	Yes	The hyperparameters are optimized using Bayesian optimization (Frazier, 2018), by minimizing the FPR95 metric on the validation set. Resulting hyperparameters are reported in Appendix G. In Appendix G, Table 3 is titled "Our optimal hyperparameters for different models and datasets." and lists specific values for y0, y1a, z1, y1b, m1, z2, m2.