A General-Purpose Multi-Modal OOD Detection Framework
Authors: Viet Quoc Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, WenLing Hsu, Han Zhao, Huajie Shao
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on multiple benchmarks demonstrate that the proposed WOOD significantly outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach can achieve superior detection performance in a variety of OOD scenarios. (...) 5 Experiments: In this section, we carry out extensive experimentation to evaluate the performance of the proposed WOOD model on multiple benchmark datasets. Then, we conduct ablation studies to explore how the main components in model design and hyperparameters impact OOD detection performance. |
| Researcher Affiliation | Collaboration | Viet Duong EMAIL Department of Computer Science, William & Mary Qiong Wu EMAIL AT&T Labs Zhengyi Zhou EMAIL AT&T Labs Eric Zavesky EMAIL AT&T Labs Wen-Ling Hsu EMAIL AT&T Labs Han Zhao EMAIL Department of Computer Science, University of Illinois at Urbana-Champaign Huajie Shao EMAIL Department of Computer Science, William & Mary |
| Pseudocode | Yes | Algorithm 1 The Proposed WOOD Model |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We implement experiments on the three real-world datasets: CUB-200 (Wah et al., 2011), MIMIC-CXR (Johnson et al., 2019), and COCO (Lin et al., 2014). For CUB-200, the textual information comes from literature (Reed et al., 2016). COCO and MIMIC-CXR contain images and their textual descriptions. |
| Dataset Splits | Yes | To improve detection performance on three scenarios simultaneously, we randomly select 1% of the training data and replace them with labeled OOD samples for each scenario, as described in Tab. 1, resulting in a total of 3% OOD samples and 97% of ID samples for weakly-supervised training. During evaluation, we use the same ratio (25%) of test samples for ID and three OOD scenarios. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | We train the proposed WOOD model using Adam optimizer (Kingma & Ba, 2014)... The paper mentions the Adam optimizer and its authors, but does not specify software versions for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | We set its hidden size to 512, the same as the dimensions of the output embeddings from CLIPimage and CLIPtext. The Binary Classifier is a 3-layer fully connected network with Re LU activation, which outputs a single probability score for binary OOD classification and the layer hidden size is 1024, 512, and 128 respectively. We train the proposed WOOD model using Adam optimizer (Kingma & Ba, 2014) with initial learning rate 5e 6 and stepped learning rate schedule. Additionally, the batch size is set to 128 in all experiments, and λ = 0.8 for the overall training adjective in Eq. (7). Regarding the margin of the Hinge loss, we choose m = 0.2 for CUB-200 and MIMIC-CXR, and m = 0.3 for COCO after grid search. |