CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che, Hongyang Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs understanding of Chinese culture.
Researcher Affiliation Collaboration Yuxuan Wang1, Yijun Liu2, Fei Yu1, Chen Huang1, Kexin Li1, Zhiguo Wan1, Wanxiang Che2, Hongyang Chen1* 1Zhejiang Lab, Hangzhou, 311121 2Harbin Institute of Technology, Harbin, 150001 EMAIL EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper provides a link for the 'Datasets' at https://github.com/Wang Yuxuan93/CVLUE, but does not explicitly state that the source code for the methodology described in the paper is also available there or elsewhere.
Open Datasets Yes To remedy this issue, we present a new Chinese Vision-Language Understanding Evaluation (CVLUE) benchmark dataset... Datasets https://github.com/Wang Yuxuan93/CVLUE ... The proposed dataset will be made publicly available for research purposes (under the CC BY-NC-ND 4.0 license) after the paper gets accepted.
Dataset Splits Yes Task |Train| |Valid| |Test| Metrics ITR 17,920 3,116 8,973 R@k VQA 14,362 2,571 7,169 Acc VG 10,769 1,965 5,385 Io U VD 3,975 651 2,036 R@k Table 2: Data splits (in terms of image numbers) and evaluation metrics of tasks in CVLUE.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper lists several baseline VLMs (CCLM, X2VLM, Qwen-VL, Qwen-VL-Chat, m PLUG-Owl2) but does not provide specific version numbers for these or any other ancillary software components, libraries, or programming languages used.
Experiment Setup Yes Please refer to the Appendix for prompts used in the zero-shot setting and detailed fine-tuning setups.