reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Probing Visual Language Priors in VLMs

Authors: Tiange Luo, Ang Cao, Gunhee Lee, Justin Johnson, Honglak Lee

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce Vi LP, a benchmark featuring deliberately out-of-distribution images synthesized via image generation models and out-of-distribution Q&A pairs... Although humans achieve near-perfect accuracy, modern VLMs falter; for instance, GPT4o achieves only 66.17% on Vi LP... We demonstrate its effectiveness in LLa VAv1.5 and Cambrian. Project Page: Vi LP.
Researcher Affiliation	Collaboration	1University of Michigan 2LGAI Research.
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper. The methodology is described in prose and mathematical formulations.
Open Source Code	No	The abstract mentions "Project Page: Vi LP." but does not provide a direct link to the source code repository for the methodology described in this paper, nor does it explicitly state that the code is being released or available in supplementary materials.
Open Datasets	Yes	Given a seed image from COCO (Lin et al., 2014), Text2VQA (Singh et al., 2019b), or Visual Genome (Krishna et al., 2017), VLMs are tasked with simultaneously selecting appropriate functions...
Dataset Splits	No	The paper mentions training models using "800k and 400k DPO pairs to fine-tune LLa VA (7B and 13B) and Cambrian-8B, respectively", but does not provide specific training, validation, or test dataset splits for their experiments.
Hardware Specification	Yes	The GPUs we used are 8-L40S.
Software Dependencies	No	The paper refers to pre-trained models like Stable Diffusion XL, Instruct-Pix2Pix, and Grounded-SAM, but does not provide specific version numbers for these or other software dependencies such as programming languages or deep learning frameworks used for implementation.
Experiment Setup	Yes	Batch sizes are set to 112 for LLa VA-7B, 80 for LLa VA-13B, and 8 (with 4 gradient accumulation steps) for Cambrian-8B. We employ Lo RA with a rank of 128, an alpha of 256, and a learning rate of 5e-7, training each model for 2 epochs.