reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VLM’s Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models

Authors: Nam Hyeon-Woo, Moon Ye-Bin, Wonseok Choi, Lee Hyun, Tae-Hyun Oh

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose an eye examination process to investigate how a VLM perceives images, focusing on key aspects of visual recognition, ranging from basic color and shape to semantic understanding. We introduce a dataset, LENS, to guide VLMs to follow the examination and check its readiness. Once the model is ready, we conduct the examination. We quantify and visualize VLMs sensitivities to color and shape, and semantic matching. Our findings reveal that VLMs have varying sensitivity to different colors while consistently showing insensitivity to green across different VLMs. Also, we found different shape sensitivity and semantic recognition depending on LLM s capacity despite using the same fixed visual encoder.
Researcher Affiliation	Collaboration	Nam Hyeon-Woo EMAIL Department of Electrical Engineering POSTECH Moon Ye-Bin EMAIL Department of Electrical Engineering POSTECH Wonseok Choi EMAIL Grad. School of AI POSTECH Lee Hyun EMAIL Department of Electrical Engineering, POSTECH Samsung AI Center, Samsung Electronics Tae-Hyun Oh EMAIL School of Computing, KAIST Department of Electrical Engineering & Grad. School of AI, POSTECH
Pseudocode	No	The paper describes the steps for the eye examination process (e.g., color test steps 1-3 in Section 3.1) in paragraph text. It does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured code-like formatting.
Open Source Code	No	The paper does not provide an explicit statement about releasing code for the methodology, nor does it include a direct link to a code repository.
Open Datasets	No	We introduce a dataset, LENS, to guide VLMs to follow the examination and check its readiness. More details and data statistics can be found in the Appendix. The statistics of our LENS are in Table 4. However, no specific link, DOI, or repository for the LENS dataset is provided to ensure public access.
Dataset Splits	Yes	To give an instruction to a model about how to perform the examination, we finetune VLMs using Lo RA (Hu et al., 2022) on the training set of LENS. Then, the test set of LENS is utilized to check the model s understanding of the instructions. Table 4: Statistics of the LENS dataset. Color Shape Semantic yes or no 1 or 2 Patch Train 2,648 6,720 3,500 1,820 3,500 * 3 Validation 568 3,360 1,000 520 1,500 * 3
Hardware Specification	Yes	We use 8 A100 80G GPUs for our experiments.
Software Dependencies	No	The paper mentions using Lo RA (Hu et al., 2022), Adam optimizer (Kingma & Ba, 2015), and models like LLa VA (Liu et al., 2023b) and Instruct BLIP (Dai et al., 2023). However, it does not specify version numbers for general software dependencies such as programming languages (e.g., Python 3.x) or deep learning frameworks (e.g., PyTorch 1.x).
Experiment Setup	Yes	We set the training epoch as 2, batch size 128, and learning rate 0.0002 with cosine scheduling, Adam optimizer (Kingma & Ba, 2015) and gradient checkpointing.