SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
Authors: Woohyeon Park, Woojin Kim, Jaeik Kim, Jaeyoung Do
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that SECOND outperforms several baselines across diverse benchmarks, including POPE (Li et al., 2023), VQAv2 (Antol et al., 2015), MMStar (Chen et al., 2024a), and MMBench (Liu et al., 2025), highlighting its effectiveness. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea 2Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea. Correspondence to: Jaeyoung Do <EMAIL>. |
| Pseudocode | Yes | D. Patch Selection Algorithm |
| Open Source Code | Yes | Code is available at https://github.com/AIDASLab/SECOND. |
| Open Datasets | Yes | Extensive experiments demonstrate that SECOND outperforms several baselines across diverse benchmarks, including POPE (Li et al., 2023), VQAv2 (Antol et al., 2015), MMStar (Chen et al., 2024a), and MMBench (Liu et al., 2025) |
| Dataset Splits | Yes | POPE (Li et al., 2023) is a widely adopted benchmark that specializes in identifying perceptual hallucination by querying the presence of specific objects in a given image through simple yes/no questions. It employs recall, accuracy, and f1 score as the primary evaluation metrics and includes 3k questions derived from well-known datasets such as MSCOCO (Lin et al., 2014), A-OKVQA (Schwenk et al., 2022), and GQA (Hudson & Manning, 2019). In this study, we evaluated the models using the popular split of the POPE benchmark. ... For the general tasks, VQAv2 (Antol et al., 2015) serves as a benchmark for evaluating VLMs ability to generate answers for given image-question pairs. ... We evaluate the lite version consisting of 0.5k questions ... MMStar comprises 1.5k questions, while MMBench s lite version includes 0.5k samples. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For the several hyperparameters in SECOND, we serve the optimal settings in Appendix C, further analyzing the hyperparameter sensitivity in Sec. 5.5. ... Table 6. Optimal settings of patch selection hyperparameter λ. ... Table 7. Optimal settings of multi-stage CD hyperparameters α, β, and γ. |