reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Authors: Muye Huang, Han Lai, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, Jun Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on various open-source and proprietary VLMs tested on Evo Chart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the Evo Chart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on Evo Chart-QA.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Xi an Jiaotong University 2MOE KLINNS Lab, Xi an Jiaotong University 3Shaanxi Province Key Laboratory of Big Data Knowledge Engineering EMAIL, EMAIL
Pseudocode	No	The paper describes the Evo Chart method in Section 3 with sub-sections like 'Compositional Chart Generation', 'Chart Evaluation and Refinement', and 'QA-pairs Generation and Training'. These sections describe the procedural steps but do not present them in structured pseudocode or an algorithm block format.
Open Source Code	Yes	Homepage https://github.com/Muye Huang/Evo Chart
Open Datasets	Yes	Homepage https://github.com/Muye Huang/Evo Chart and Evo Chart-QA is a comprehensive and challenging benchmark for real-world chart understanding. We carefully selected 625 charts with diverse appearances, all sourced from real-world websites. Then we curated 1250 chart-based understanding questions through human experts. This process ensures that Evo Chart-QA accurately reflects real-world scenarios.
Dataset Splits	No	The paper describes the Evo Chart-QA benchmark as consisting of '650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions' in the abstract, and further details its construction in Section 4. However, it does not provide explicit training, validation, or test splits for this dataset, nor for the Chart QA dataset it also uses for evaluation.
Hardware Specification	Yes	All experiments were completed on 4 NVIDIA A800 80G GPUs.
Software Dependencies	No	The paper mentions employing 'ECharts (Li et al. 2018) for rendering charts', but it does not provide specific version numbers for ECharts or any other key software dependencies or libraries used in the experiments.
Experiment Setup	Yes	In Evo Chart method, we utilize Phi3-Vision (Abdin et al. 2024) as the initialization model. We conducted a 3-Stage data synthesis and training process, with each Stage undergoing 1 Epoch of SFT with a learning rate of 2e-5 and using cosine learning rate scheduler.