A General-Purpose Multi-Modal OOD Detection Framework

Authors: Viet Quoc Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, WenLing Hsu, Han Zhao, Huajie Shao

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on multiple benchmarks demonstrate that the proposed WOOD significantly outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach can achieve superior detection performance in a variety of OOD scenarios. (...) 5 Experiments: In this section, we carry out extensive experimentation to evaluate the performance of the proposed WOOD model on multiple benchmark datasets. Then, we conduct ablation studies to explore how the main components in model design and hyperparameters impact OOD detection performance.
Researcher Affiliation Collaboration Viet Duong EMAIL Department of Computer Science, William & Mary Qiong Wu EMAIL AT&T Labs Zhengyi Zhou EMAIL AT&T Labs Eric Zavesky EMAIL AT&T Labs Wen-Ling Hsu EMAIL AT&T Labs Han Zhao EMAIL Department of Computer Science, University of Illinois at Urbana-Champaign Huajie Shao EMAIL Department of Computer Science, William & Mary
Pseudocode Yes Algorithm 1 The Proposed WOOD Model
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes We implement experiments on the three real-world datasets: CUB-200 (Wah et al., 2011), MIMIC-CXR (Johnson et al., 2019), and COCO (Lin et al., 2014). For CUB-200, the textual information comes from literature (Reed et al., 2016). COCO and MIMIC-CXR contain images and their textual descriptions.
Dataset Splits Yes To improve detection performance on three scenarios simultaneously, we randomly select 1% of the training data and replace them with labeled OOD samples for each scenario, as described in Tab. 1, resulting in a total of 3% OOD samples and 97% of ID samples for weakly-supervised training. During evaluation, we use the same ratio (25%) of test samples for ID and three OOD scenarios.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No We train the proposed WOOD model using Adam optimizer (Kingma & Ba, 2014)... The paper mentions the Adam optimizer and its authors, but does not specify software versions for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We set its hidden size to 512, the same as the dimensions of the output embeddings from CLIPimage and CLIPtext. The Binary Classifier is a 3-layer fully connected network with Re LU activation, which outputs a single probability score for binary OOD classification and the layer hidden size is 1024, 512, and 128 respectively. We train the proposed WOOD model using Adam optimizer (Kingma & Ba, 2014) with initial learning rate 5e 6 and stepped learning rate schedule. Additionally, the batch size is set to 128 in all experiments, and λ = 0.8 for the overall training adjective in Eq. (7). Regarding the margin of the Hinge loss, we choose m = 0.2 for CUB-200 and MIMIC-CXR, and m = 0.3 for COCO after grid search.