reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diffusion-based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection

Authors: Suhee Yoon, Sanghyu Yoon, Ye Seul Sim, Sungik Choi, Kyungeun Lee, Hye-Seung Cho, Hankook Lee, Woohyung Lim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our framework, achieving an impressive AUROC of 88% on near OOD datasets, which surpasses the performance of baseline methods by a significant margin of approximately 6%.
Researcher Affiliation	Collaboration	1LG AI Research 2Sungkyunkwan University EMAIL EMAIL
Pseudocode	Yes	A detailed pseudo-code implementation of our framework can be found in the Appendix.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions 'See https://huggingface.co/stabilityai/stable-diffusion-2-base for more details' but this is for a third-party model used, not the authors' own implementation.
Open Datasets	Yes	We mainly evaluate our framework on Image Net-200 (Zhang et al. 2023b) as ID, a subset of 200 categories from Image Net-1k (Deng et al. 2009). Our evaluation covers both far-OOD and challenging near-OOD scenarios. For far-OOD detection, we employ widely-used datasets such as i Naturalist, Texture, and Open Image-O. For near-OOD detection and SSB-hard and NINCO for near-OOD detection., which have no class overlap but show close semantic similarity with Image Net-1K.
Dataset Splits	No	The paper references Image Net-200 as the ID dataset and other datasets for OOD detection, but does not provide specific details on how these datasets were split into training, validation, and test sets for the experimental setup, nor does it refer to standard splits with sufficient clarity for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments. It only mentions using 'Stable Diffusion v2-base model 1'.
Software Dependencies	No	The paper mentions 'Stable Diffusion v2-base model 1' but does not provide specific version numbers for other key software components, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow) used in the implementation.
Experiment Setup	No	The paper describes the loss function and network architecture (ResNet-18), and discusses ranges for some hyperparameters (λ and s) in an ablation study. However, it lacks specific values for critical training hyperparameters such as learning rate, batch size, number of epochs, optimizer details, and the specific β value for the loss function, which are necessary for full reproducibility of the main results.