GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs
Authors: Maizhen Ning, Zihao Zhou, Qiufeng Wang, Xiaowei Huang, Kaizhu Huang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, our Phi3-Vision-based MLLM wins first place on the PGPs solving task of Math Vista benchmark, outperforming GPT-4o, Gemini Ultra and other much larger MLLMs. While LLa VA-13B-based MLLM markedly exceeded other close-source and open-source MLLMs on the Math Verse benchmark and also achieved the new SOTA on Geo QA dataset. |
| Researcher Affiliation | Academia | Maizhen Ning*1,2, Zihao Zhou*1,2, Qiufeng Wang1 , Xiaowei Huang2, Kaizhu Huang3 1School of Advanced Technology, Xi an Jiaotong-Liverpool University 2University of Liverpool 3Duke Kunshan University EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in prose, detailing the components of GNS: Knowledge Prediction, Symbolic Parsing, Problem Reasoning, and Symbolic Computation, often using mathematical formulations. However, it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks with structured steps. |
| Open Source Code | Yes | Project https://github.com/ning-mz/GNS |
| Open Datasets | Yes | To handle the data issue, we construct a multi-task plane geometry problem related dataset GNS-260K, which is the largest PGPs dataset so far. [...] We construct GNS-260K based on two existing PGP datasets: PGPS9K (Zhang, Yin, and Liu 2023) and Geo QA+ (Cao and Xiao 2022). |
| Dataset Splits | Yes | To obtain a standard performance measurement with different MLLMs rather than simply test on a single base plane geometry problem dataset, we selected two benchmarks including Math Vista (Lu et al. 2024b) and Math Verse (Zhang et al. 2024b). Specifically, we evaluate the Geometry Problem Solving task from the test-mini set of Math Vista (GPS) and the entire testmini set of Math Verse. [...] The testmini set in Math Verse has 3,940 samples in total and 64.7% of them are plane geometry problems. [...] All diagrams and base problems are from the existing PGP datasets including the training set of both PGPS9K (Zhang, Yin, and Liu 2023) and Geo QA+ (Cao and Xiao 2022) |
| Hardware Specification | Yes | trained on 4 NVIDIA A800 80GB GPUs. |
| Software Dependencies | No | we use a Python library Sym Py (Meurer et al. 2017) as the symbolic computation tool. While the paper mentions SymPy as a Python library, it does not provide specific version numbers for either Python or SymPy. |
| Experiment Setup | Yes | We fully finetune the MLLMs with learning rate 5e 5 for Deep Seek-VL-1.3B and 3e 5 for the others, 2 epochs training, batch size 8 per GPU and trained on 4 NVIDIA A800 80GB GPUs. |