MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Authors: Erle Zhu, Yadi Liu, Zhe Zhang, Xujun Li, JinZhou, Xinjie Yu, Minlie Huang, Hongning Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Validated using our collected collegelevel circuit analysis problems, MAPS significantly improves reasoning accuracy of MLLM and outperforms all existing models. The results confirm MAPS offers a promising direction for enhancing multi-modal scientific reasoning ability of MLLMs. Our code is available at https://github.com/thu-coai/MAPS. |
| Researcher Affiliation | Academia | 1The Conversational AI (Co AI) Group, 2Department of Computer Science & Technology 3Department of Electrical Engineering Tsinghua University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 MAPS: Inference Phase |
| Open Source Code | Yes | Our code is available at https://github.com/thu-coai/MAPS. |
| Open Datasets | No | To evaluate the entire MAPS framework on real-world physical problems, we collected 79 high-quality circuit analysis problems from related textbooks and name it Simple Circuit Eval. Simple Circuit Eval is constrcuted based on exercise problems primarily collected Chinese circuit analysis text books, but since current MLLMs are primarily multilingual and the linguistic type is not an influencing factor in our framework, this should not affect the evaluation of different MLLMs on this dataset. |
| Dataset Splits | Yes | ppm-syn-lprc contains 20k pairs of synthetic circuit diagrams and their simulation descriptions, divided into training, validation, and test sets in a ratio of 8:1:1. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU/CPU models or other detailed computer specifications for their own experimental setup. It mentions using Cog VLM-17B and GPT-4V, which are models, but not the hardware they ran these models on for their experiments. |
| Software Dependencies | No | We use Ng SPICE (Nenzi & Vogt, 2011) developed by the UC Berkeley CAD Group as our simulator. |
| Experiment Setup | Yes | We list our main hyperparameters used for PPM training at Table 6. Table 6: Main Hyper-parameters of PPM Training Param. Setting lora-rank 50 max-length 2000 batch-size 32 train-iters 2000 optimizer Adam learning-rate 1e-5 lr-decay-style cosine warmup 0.2 |