reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che, Hongyang Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs understanding of Chinese culture.
Researcher Affiliation	Collaboration	Yuxuan Wang1, Yijun Liu2, Fei Yu1, Chen Huang1, Kexin Li1, Zhiguo Wan1, Wanxiang Che2, Hongyang Chen1* 1Zhejiang Lab, Hangzhou, 311121 2Harbin Institute of Technology, Harbin, 150001 EMAIL EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper provides a link for the 'Datasets' at https://github.com/Wang Yuxuan93/CVLUE, but does not explicitly state that the source code for the methodology described in the paper is also available there or elsewhere.
Open Datasets	Yes	To remedy this issue, we present a new Chinese Vision-Language Understanding Evaluation (CVLUE) benchmark dataset... Datasets https://github.com/Wang Yuxuan93/CVLUE ... The proposed dataset will be made publicly available for research purposes (under the CC BY-NC-ND 4.0 license) after the paper gets accepted.
Dataset Splits	Yes	Task \|Train\| \|Valid\| \|Test\| Metrics ITR 17,920 3,116 8,973 R@k VQA 14,362 2,571 7,169 Acc VG 10,769 1,965 5,385 Io U VD 3,975 651 2,036 R@k Table 2: Data splits (in terms of image numbers) and evaluation metrics of tasks in CVLUE.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper lists several baseline VLMs (CCLM, X2VLM, Qwen-VL, Qwen-VL-Chat, m PLUG-Owl2) but does not provide specific version numbers for these or any other ancillary software components, libraries, or programming languages used.
Experiment Setup	Yes	Please refer to the Appendix for prompts used in the zero-shot setting and detailed fine-tuning setups.