reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection

Authors: Hengzhuang Li, Teng Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By empirically competing with SOTA baselines on both standard and large-scale benchmarks, we verify the efficacy and efficiency of our proposed Ham OS. Our code is available at: https://github.com/Fir-lat/Ham OS_OOD. ... We conduct extensive empirical analysis to demonstrate the state-of-the-art (SOTA) performance of Ham OS.
Researcher Affiliation	Academia	Hengzhuang Li, Teng Zhang School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China EMAIL
Pseudocode	Yes	The pseudo-code is provided in Algorithm 1, with the whole training pipeline displayed in Algorithm 2 in Appendix D.
Open Source Code	Yes	Our code is available at: https://github.com/Fir-lat/Ham OS_OOD.
Open Datasets	Yes	Following the common practice for benchmarking the OOD detection (Zhang et al., 2023b), we use CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009) and Image Net-1K (Deng et al., 2009) as ID datasets, and adopt a series of datasets as OOD testing data. For CIFAR ID datasets, we use MNIST (Deng, 2012), SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), Places365 (Zhou et al., 2017), and LSUN (Yu et al., 2015) as OOD testing data; for Image Net-1K, we use i Natualist (Van Horn et al., 2018), Textures (Cimpoi et al., 2014), SUN (Xiao et al., 2010), and Places365 (L opez-Cifuentes et al., 2020) as OOD testing data.
Dataset Splits	No	The paper mentions using 'ID training dataset' and 'OOD test datasets' and refers to common practice for benchmarking OOD detection. However, it does not explicitly state the specific split percentages (e.g., 80/10/10) or sample counts for training, validation, and test sets for any of the mentioned datasets within its own text.
Hardware Specification	Yes	All experiments in this paper are conducted for multiple runs on a single NVIDIA Tesla V100 Tensor Core with 32GB memory using Python version 3.10.9.
Software Dependencies	Yes	Python version 3.10.9. The deep learning environment is established using Py Torch version 1.13.1 and Trochvision version 0.14.1 with CUDA 12.2 in the Ubuntu 18.04.6 system.
Experiment Setup	Yes	For fine-tuning a pretrained model, we set the training epochs to 20 following previous works (Ming et al., 2023b; Tao et al., 2023), with the size of mini-batch set to 128 for CIFAR-10/100 and 256 for Image Net-1K. For Image Net-1K, we freeze the first three layers of the pretrained model following Ming et al. (2023b) and use the cosine annealing as the learning rate schedule beginning at 0.0001. For memory efficiency, we set the size of the class-conditional ID buffer to 1000 for CIFAR10/100 and 100 for Image Net-1K. We summarize the default training configurations of Ham OS in Table 4. Table 4: Training Configurations of Ham OS: Training epochs 20, Learning rate 0.01, Momentum 0.9, Batch size 128, Weight decay 1.0 10 4, LR schedule Cos. Anneal, Prototype update factor 0.95, Buffer size of ID data 1000, Bandwidth κ 2.0, OOD-discernment weight λd 0.1, k for KNN distance 200, Hard margin δ 0.1, Leapfrog step L 3, Step size ϵ 0.1, Number of adjacent ID clusters Nadj 4, Synthesis rounds R 5.