reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Authors: Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing HONG, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Utilizing the Geo170K dataset, we introduce G-LLa VA, a model that demonstrates exceptional performance in solving geometric problems. It significantly outperforms GPT4-V on the geometry task of Math Vista benchmark with only 7B parameters. ...We evaluate G-LLa VA on the geometry problems solving (GPS) task (testmini split) of Math Vista (Lu et al., 2023) and test set of Geo QA. ...Main Experiments. We compare MLLMs on testmini split of Math Vista (Lu et al., 2023) benchmark on Table 8.
Researcher Affiliation	Collaboration	Jiahui Gao1,2 , Renjie Pi3 , Jipeng Zhang3, Jiacheng Ye2, Wanjun Zhong1, Yufei Wang1, Lanqing Hong1, Jianhua Han1, Hang Xu1, Zhenguo Li1, Lingpeng Kong2 1Noah s Ark Lab 2The University of Hong Kong 3The Hong Kong University of Science and Technology EMAIL, EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. It describes methodologies in narrative text and uses figures to illustrate concepts, but no formal algorithm listings.
Open Source Code	Yes	Our code, data, and models are publicly accessible at https://github.com/pipilurj/G-LLa VA.
Open Datasets	Yes	This dataset, named Geo170K, contains more than 170K geometric image-caption and question-answer pairs. ...Our code, data, and models are publicly accessible at https://github.com/pipilurj/G-LLa VA.
Dataset Splits	Yes	We evaluate G-LLa VA on the geometry problems solving (GPS) task (testmini split) of Math Vista (Lu et al., 2023) and test set of Geo QA. ...More details of data split on Geo QA and Geo QA+ is listed in Table 16. Table 16: Data Split of Geo QA and Geo QA+ Dataset Train Validation Test Geo QA+ (Cao and Xiao, 2022) 6027 745 754 Geo QA (Chen et al., 2021) 3499 745 754
Hardware Specification	Yes	For training G-LLa VA-7B, each run requires 10 hours on 8 A40 GPUs (48G of memory).
Software Dependencies	Yes	We employ Chat GPT (gpt-3.5-turbo-0613) for data generation. ...The LLM part of G-LLa VA utilizes LLAMA-2 (Touvron et al., 2023) as the language model and employ the pretrained vision transformer Radford et al. (2021) as the vision encoder. We conduct experiments with both 7B and 13B LLMs.
Experiment Setup	Yes	During training, the learning rate is set to 3e 5. We expand the images into squares during training, where the extended background color is set to white. For image augmentation, we set the maximum translation distance to 0.25 of the length of longer side. If not otherwise specified, the models are trained for 1 epoch for cross-modal alignment and 2 epochs for instruction tuning, respectively. And the batch sizes are set to 6 and 32 per GPUs, respectively.