EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Authors: Muye Huang, Han Lai, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, Jun Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on various open-source and proprietary VLMs tested on Evo Chart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the Evo Chart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on Evo Chart-QA.
Researcher Affiliation Academia 1School of Computer Science and Technology, Xi an Jiaotong University 2MOE KLINNS Lab, Xi an Jiaotong University 3Shaanxi Province Key Laboratory of Big Data Knowledge Engineering EMAIL, EMAIL
Pseudocode No The paper describes the Evo Chart method in Section 3 with sub-sections like 'Compositional Chart Generation', 'Chart Evaluation and Refinement', and 'QA-pairs Generation and Training'. These sections describe the procedural steps but do not present them in structured pseudocode or an algorithm block format.
Open Source Code Yes Homepage https://github.com/Muye Huang/Evo Chart
Open Datasets Yes Homepage https://github.com/Muye Huang/Evo Chart and Evo Chart-QA is a comprehensive and challenging benchmark for real-world chart understanding. We carefully selected 625 charts with diverse appearances, all sourced from real-world websites. Then we curated 1250 chart-based understanding questions through human experts. This process ensures that Evo Chart-QA accurately reflects real-world scenarios.
Dataset Splits No The paper describes the Evo Chart-QA benchmark as consisting of '650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions' in the abstract, and further details its construction in Section 4. However, it does not provide explicit training, validation, or test splits for this dataset, nor for the Chart QA dataset it also uses for evaluation.
Hardware Specification Yes All experiments were completed on 4 NVIDIA A800 80G GPUs.
Software Dependencies No The paper mentions employing 'ECharts (Li et al. 2018) for rendering charts', but it does not provide specific version numbers for ECharts or any other key software dependencies or libraries used in the experiments.
Experiment Setup Yes In Evo Chart method, we utilize Phi3-Vision (Abdin et al. 2024) as the initialization model. We conducted a 3-Stage data synthesis and training process, with each Stage undergoing 1 Epoch of SFT with a learning rate of 2e-5 and using cosine learning rate scheduler.