reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Authors: Nick Jiang, Anish Kachinthaya, Suzanne Petryk, Yossi Gandelsman

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that targeted edits to a model s latent representations can reduce hallucinations by up to 25.7% on the COCO2014 dataset while preserving performance. Our findings demonstrate how a deeper understanding of VLMs latent representations can enhance reliability and enable novel capabilities, such as zero-shot segmentation. We apply PROJECTAWAY on 5000 random images from the COCO2014 training set on all mentioned COCO objects (i.e. hallucination and CD) individually and measure the removal rate at which objects no longer appear in the caption. We evaluate the strength of the internal confidence co as an indicator of object presence. We sample 5000 images from the MSCOCO training set, using the image captioning objective to caption methods with both Instruct BLIP and LLa VA. Quantitative results in Table 2 show that we outperform our baselines and reduce hallucinations by 25.7% on Instruct BLIP and 23.8% on LLa VA compared to beam search.
Researcher Affiliation	Academia	Nick Jiang , Anish Kachinthaya , Suzie Petyrk ,Yossi Gandelsman University of California, Berkeley EMAIL
Pseudocode	Yes	Algorithm 1: PROJECTAWAY
Open Source Code	Yes	1Code: https://github.com/nickjiang2378/vl-interp
Open Datasets	Yes	reduce hallucinations by up to 25.7% on the COCO2014 dataset while preserving performance. We use Instruct BLIP and LLa VA to caption 5000 random COCO2014 images in the Karpathy validation split (Lin et al., 2015) and determine co for all 80 COCO objects... We evaluate our method on the Imagenet validation set. We filter the VQA dataset for color and object number inaccuracies and correct answers with low confidence scores (co < 0.05) using PROJECTAWAY.
Dataset Splits	Yes	We use Instruct BLIP and LLa VA to caption 5000 random COCO2014 images in the Karpathy validation split (Lin et al., 2015)... We apply PROJECTAWAY on 5000 random images from the COCO2014 training set... We evaluate across 500 training samples from COCO 2014 that have at least one hallucination. We apply these parameters to 500 COCO images from the Karpathy validation set. We evaluate our method on the Imagenet validation set.
Hardware Specification	No	No specific hardware details (like GPU or CPU models) are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup	Yes	For Instruct BLIP, we set (l I, l T , α) = (1, 2, 1.5). For LLa VA, we set (l I, l T , α) = (19, 21, 3.5). We set p = 0.9 for nucleus sampling. We use beam search in our method and unify Nbeam = 5 for the baseline. We threshold hallucinations as co < 0.2 for Instruct BLIP and co < 0.1 for LLa VA. Based on prior ablations (Section 4.2), we select (l I = 1, l T = 2, α = 1.5) for Instruct BLIP and (l I = 19, l T = 21, α = 3.5) for LLa VA.