Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation

Authors: Jihyo Kim, Seulbi Lee, Sangheum Hwang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our Re Guide enhances the performance of current LVLMs in both image classification and Oo DD tasks.
Researcher Affiliation Academia Jihyo Kim , Seulbi Lee , Sangheum Hwang Department of Data Science, Seoul National University of Science and Technology EMAIL
Pseudocode No The paper describes the proposed method, Reflexive Guidance, in paragraph text and illustrates it with a framework diagram (Figure 5) but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes 1https://github.com/daintlab/Re Guide
Open Datasets Yes We evaluate the comparison models on the CIFAR10 and Image Net200 benchmarks proposed in Open OOD v1.5 (Zhang et al., 2024a). ... The Image Net200 benchmark consists of Image Net200 (Russakovsky et al., 2015) as the ID dataset, two datasets NINCO (Bitterwolf et al., 2023) and SSB Hard (Vaze et al., 2022) as the near-Oo D datasets, and three datasets i Naturalist (Van Horn et al., 2018), Textures, and Openimage-O (Wang et al., 2022) as the far-Oo D datasets.
Dataset Splits Yes Due to cost, time, and API rate limits, we use 25% subsets of the benchmarks. A detailed explanation of the benchmark datasets can be found in Appendix B.1. ... We sample 25% of each dataset, ensuring that the proportion of datasets in each benchmark are maintained. During sampling, we maintain the ratio of the number of samples for each label from the original dataset. Tables B.1.1 and B.1.2 present the number of images in each dataset for the Image Net200 and CIFAR10 benchmarks, respectively.
Hardware Specification Yes All experiments are implemented with Python 3.9 and Py Torch 1.9, using NVIDIA A100 80GB GPUs. For the Intern VL2-llama3-76B model, due to its significant computational requirements, we employ 4 A100 GPUs with the VLLM (Kwon et al., 2023) library to reduce time overhead. For the remaining models, a single A100 GPU is sufficient to run the experiments.
Software Dependencies Yes All experiments are implemented with Python 3.9 and Py Torch 1.9, using NVIDIA A100 80GB GPUs. For the Intern VL2-llama3-76B model, due to its significant computational requirements, we employ 4 A100 GPUs with the VLLM (Kwon et al., 2023) library to reduce time overhead.
Experiment Setup Yes Our prompt consists of four components: a task description, an explanation of the rejection class, guidelines, and examples for the response format. ... The Oo DD task allows us to investigate how LVLMs behave when required to generate responses beyond the categories defined within the user-provided prompt. ... For this experiment, we set the number of negative class suggestions for each group N to 20.