CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che, Hongyang Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs understanding of Chinese culture. |
| Researcher Affiliation | Collaboration | Yuxuan Wang1, Yijun Liu2, Fei Yu1, Chen Huang1, Kexin Li1, Zhiguo Wan1, Wanxiang Che2, Hongyang Chen1* 1Zhejiang Lab, Hangzhou, 311121 2Harbin Institute of Technology, Harbin, 150001 EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link for the 'Datasets' at https://github.com/Wang Yuxuan93/CVLUE, but does not explicitly state that the source code for the methodology described in the paper is also available there or elsewhere. |
| Open Datasets | Yes | To remedy this issue, we present a new Chinese Vision-Language Understanding Evaluation (CVLUE) benchmark dataset... Datasets https://github.com/Wang Yuxuan93/CVLUE ... The proposed dataset will be made publicly available for research purposes (under the CC BY-NC-ND 4.0 license) after the paper gets accepted. |
| Dataset Splits | Yes | Task |Train| |Valid| |Test| Metrics ITR 17,920 3,116 8,973 R@k VQA 14,362 2,571 7,169 Acc VG 10,769 1,965 5,385 Io U VD 3,975 651 2,036 R@k Table 2: Data splits (in terms of image numbers) and evaluation metrics of tasks in CVLUE. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper lists several baseline VLMs (CCLM, X2VLM, Qwen-VL, Qwen-VL-Chat, m PLUG-Owl2) but does not provide specific version numbers for these or any other ancillary software components, libraries, or programming languages used. |
| Experiment Setup | Yes | Please refer to the Appendix for prompts used in the zero-shot setting and detailed fine-tuning setups. |