reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

Authors: Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Shanghang Zhang, Gao Peng, Hongsheng Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On various mathematical benchmarks, our MAVIS-7B achieves leading results among open-source MLLMs, e.g., surpassing other 7B models by +9.3% and the second-best LLa VANe XT (110B) by +6.9%, demonstrating the effectiveness of our method. Data and models are released at https://github.com/ZrrSkywalker/MAVIS. ... We evaluate our model MAVIS-7B on several popular mathematical benchmarks, Math Verse (Zhang et al., 2024b), Geo QA (Chen et al., 2021c), Function QA (function problems in Math Vista (Lu et al., 2023)), MMMU-Math (the math problems in MMMU (Yue et al., 2023a)), Math Vision (Wang et al., 2024b), three mathematical categories in Math Vista, and We-Math (Qiao et al., 2024). We compare a variety of existing MLLMs...
Researcher Affiliation	Academia	1CUHK MMLab & 2Miu Lar Lab 3Peking University 4Shanghai AI Laboratory 5CPII under Inno HK EMAIL, EMAIL
Pseudocode	No	The paper describes the data generation process and training pipeline in natural language and flowcharts (Figure 2), and uses mathematical formulations (Equations 1-3 in Section A.4.1), but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Data and models are released at https://github.com/ZrrSkywalker/MAVIS.
Open Datasets	Yes	With this approach, we curate two datasets, MAVIS-Caption (558K diagram-caption pairs) and MAVIS-Instruct (834K visual math problems with Co T rationales), and propose four progressive stages for training MLLMs from scratch. ... Data and models are released at https://github.com/ZrrSkywalker/MAVIS.
Dataset Splits	Yes	We evaluate our model MAVIS-7B on several popular mathematical benchmarks, Math Verse (Zhang et al., 2024b), Geo QA (Chen et al., 2021c), Function QA (function problems in Math Vista (Lu et al., 2023)), MMMU-Math (the math problems in MMMU (Yue et al., 2023a)), Math Vision (Wang et al., 2024b), three mathematical categories in Math Vista, and We-Math (Qiao et al., 2024). ... we conduct an ablation study on the 834K MAVIS-Instruct dataset by randomly sampling 25%, 50%, and 75% of the data for instruction tuning, excluding the DPO stage.
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The logic of the data engine is implemented in Python, and we employ Matplotlib for the graphical rendering of the diagrams. However, specific version numbers for Python, Matplotlib, or other software libraries are not provided.
Experiment Setup	Yes	In the first stage, we fine-tune the CLIP for 10 epochs with a batch size 16 and an initial learning rate 2e-6. In the second stage, we train the diagram-language alignment for 1 epoch with a batch size 32 and an initial learning rate 2e-6, and adopt Lo RA (Hu et al., 2021) with a rank 128. In the third and fourth stages, we adopt the same training settings as the second one.