reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified Approach Towards Active Learning and Out-of-Distribution Detection

Authors: Sebastian Schmidt, Leonard Schenk, Leo Schwinn, Stephan Günnemann

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments showing the problems arising when migrating between both tasks. In our experiments SISOM underlined its effectiveness by achieving first place in one of the commonly used Open OOD benchmark settings and top-3 places in the remaining two for near-OOD data. In AL, SISOM delivers top performance in common image benchmarks. (...) To evaluate the abilities of SISOM for the real-world application life cycle (Fig. 2), we conducted comprehensive experiments on AL and OOD detection individually.
Researcher Affiliation	Collaboration	Sebastian Schmidt EMAIL Technical University of Munich BMW Group Leonard Schenk EMAIL Sprin-D Leo Schwinn EMAIL Technical University of Munich Stephan Günnemann EMAIL Technical University of Munich
Pseudocode	No	The paper describes its methodology using mathematical equations and descriptive text, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	1Project Page with Code: https://www.cs.cit.tum.de/daml/sisom
Open Datasets	Yes	We utilize the commonly used closed-set pool-based AL scenario (Settles, 2010) for AL. For OOD detection, we employ the extensive Open OOD benchmark (Yang et al., 2022; Zhang et al., 2023). (...) CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009), as well as SVHN (Netzer et al., 2011) (...) Open OOD framework (Yang et al., 2022; Zhang et al., 2023) to evaluate on the OOD detection task. We employ the recommended benchmarks on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and Image Net 1k (Deng et al., 2009)
Dataset Splits	Yes	We selected our query sizes accordingly and followed the commonly suggested sizes of q = 1000 for CIFAR-10 (Yoo & Kweon, 2019; Lüth et al., 2023) and q = 2000 for CIFAR-100 (Caramalau et al., 2021). (...) The initial set size and the query size are set to 250, as recommended by Lüth et al. (2023) (...) In the experiments conducted with RS, a representative subset size of 10% relative to the original training set was used across all experiments. (...) For the full cycle OOD experiments, we used three seeds as common of Open OOD of a CIFAR-10 AL cycle. The three seeds are taken from a SISOMe training and used as checkpoint for all OOD detection methods. Besides providing different checkpoints, we followed the Open OOD benchmark procedure for CIFAR-10 with the same data splits.
Hardware Specification	Yes	Time is measured on a desktop PC with A6000 and 64GB of RAM. (...) Time is measured on a Kubernetes pod with V100 and 24GB of RAM from a Nvidia DGX machine.
Software Dependencies	No	The paper mentions several frameworks and models like Res Net18, Res Net34, SGD optimizer, cosine scheduler, SIMCLR, SCAN, but does not provide specific version numbers for any software libraries (e.g., PyTorch, Python, CUDA, etc.).
Experiment Setup	Yes	In AL experiments, we used a Res Net18 and Res Net34 (He et al., 2016) model, with the suggested modifications of Yoo & Kweon (2019) presented in a CIFAR benchmark repository (kuangliu, 2021), which replaced the kernel of the first convolution with a 3x3 kernel. Additionally, we used an SGD optimizer with a learning rate of 0.1 and multistep scheduling at 60, 120, and 160, decreasing the learning rate by a factor of 10, which are reported benchmark parameters for CIFAR-100 (weiaicunzai, 2022). For SVHN and CIFAR-10 we used a learning rate of 0.025 and a cosine scheduler as suggested by Yehuda et al. (2022). For the construction of the feature space, we used the layers after the 4 blocks of Res Net with sigmoid steepness parameters as reported in Table 8. (...) For the CIFAR-100 experiment, instead of using the automated ravg value to balance between r and E from Eq. (11), we set ravg = 0.8 for SISOMe based on a hyperparameter study. In the benchmark tables, we reported for SISOM the best values matching the best values of the ablation study modifications. Furthermore, we follow the suggested sigmoid steepness parameters (Liu et al., 2024) for CIFAR-10 and Image Net. For CIFAR-100, we choose values that minimize ravg. A detailed overview of the sigmoid steepness parameters for the 4 blocks of Res Net18 and Res Net50 for all experiments is provided in Table 9.