Imagine While Reasoning in Space: Multimodal Visualization-of-Thought
Authors: Chengzu Li, Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan Vulić, Furu Wei
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments and ablation studies across three spatial reasoning tasks with newly collected datasets, demonstrating that MVo T exhibits superior adaptability and robustness compared to Co T in complex scenarios. |
| Researcher Affiliation | Collaboration | 1Language Technology Lab, University of Cambridge 2Microsoft Research 3Institute of Automation, Chinese Academy of Sciences. |
| Pseudocode | No | The paper describes methods using mathematical formulations (Equations 1-5) and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We will release the code and the datasets at URL-ANONYMOUS upon acceptance for reproducibility purposes. |
| Open Datasets | No | We will release the code and the datasets at URL-ANONYMOUS upon acceptance for reproducibility purposes. |
| Dataset Splits | Yes | The dataset statistics are presented in Table 4. Detailed information on data collection is provided in App. B. ... Table 4. Statistics of the collected datasets, covering varying levels of complexity in actions and patterns. ... Train Set Size 5007 6400 6846 Test Set Size 1255 1604 1664 |
| Hardware Specification | Yes | All models were trained on MI300X GPUs. |
| Software Dependencies | Yes | For GPT-4o, we utilized the 2024-07-01 version hosted on the Azure platform, with inference parameters outlined in Table 9. |
| Experiment Setup | Yes | Table 8 and 9 show the hyper-parameters for training MVo T and doing inference with GPT-4o. ... Table 8. Hyper-parameters of fine-tuning Anole 7B for different system variants. Random Seed 42 Epochs 40 Learning Rate 0.0002 Train Batch Size 4 Val Batch Size 16 8 Grad Accumulation 4 2 GPUs 8 32 |