reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A General-Purpose Multi-Modal OOD Detection Framework

Authors: Viet Quoc Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, WenLing Hsu, Han Zhao, Huajie Shao

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on multiple benchmarks demonstrate that the proposed WOOD significantly outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach can achieve superior detection performance in a variety of OOD scenarios. (...) 5 Experiments: In this section, we carry out extensive experimentation to evaluate the performance of the proposed WOOD model on multiple benchmark datasets. Then, we conduct ablation studies to explore how the main components in model design and hyperparameters impact OOD detection performance.
Researcher Affiliation	Collaboration	Viet Duong EMAIL Department of Computer Science, William & Mary Qiong Wu EMAIL AT&T Labs Zhengyi Zhou EMAIL AT&T Labs Eric Zavesky EMAIL AT&T Labs Wen-Ling Hsu EMAIL AT&T Labs Han Zhao EMAIL Department of Computer Science, University of Illinois at Urbana-Champaign Huajie Shao EMAIL Department of Computer Science, William & Mary
Pseudocode	Yes	Algorithm 1 The Proposed WOOD Model
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets	Yes	We implement experiments on the three real-world datasets: CUB-200 (Wah et al., 2011), MIMIC-CXR (Johnson et al., 2019), and COCO (Lin et al., 2014). For CUB-200, the textual information comes from literature (Reed et al., 2016). COCO and MIMIC-CXR contain images and their textual descriptions.
Dataset Splits	Yes	To improve detection performance on three scenarios simultaneously, we randomly select 1% of the training data and replace them with labeled OOD samples for each scenario, as described in Tab. 1, resulting in a total of 3% OOD samples and 97% of ID samples for weakly-supervised training. During evaluation, we use the same ratio (25%) of test samples for ID and three OOD scenarios.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	We train the proposed WOOD model using Adam optimizer (Kingma & Ba, 2014)... The paper mentions the Adam optimizer and its authors, but does not specify software versions for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	We set its hidden size to 512, the same as the dimensions of the output embeddings from CLIPimage and CLIPtext. The Binary Classifier is a 3-layer fully connected network with Re LU activation, which outputs a single probability score for binary OOD classification and the layer hidden size is 1024, 512, and 128 respectively. We train the proposed WOOD model using Adam optimizer (Kingma & Ba, 2014) with initial learning rate 5e 6 and stepped learning rate schedule. Additionally, the batch size is set to 128 in all experiments, and λ = 0.8 for the overall training adjective in Eq. (7). Regarding the margin of the Hinge loss, we choose m = 0.2 for CUB-200 and MIMIC-CXR, and m = 0.3 for COCO after grid search.