reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Imagine While Reasoning in Space: Multimodal Visualization-of-Thought

Authors: Chengzu Li, Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan Vulić, Furu Wei

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments and ablation studies across three spatial reasoning tasks with newly collected datasets, demonstrating that MVo T exhibits superior adaptability and robustness compared to Co T in complex scenarios.
Researcher Affiliation	Collaboration	1Language Technology Lab, University of Cambridge 2Microsoft Research 3Institute of Automation, Chinese Academy of Sciences.
Pseudocode	No	The paper describes methods using mathematical formulations (Equations 1-5) and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will release the code and the datasets at URL-ANONYMOUS upon acceptance for reproducibility purposes.
Open Datasets	No	We will release the code and the datasets at URL-ANONYMOUS upon acceptance for reproducibility purposes.
Dataset Splits	Yes	The dataset statistics are presented in Table 4. Detailed information on data collection is provided in App. B. ... Table 4. Statistics of the collected datasets, covering varying levels of complexity in actions and patterns. ... Train Set Size 5007 6400 6846 Test Set Size 1255 1604 1664
Hardware Specification	Yes	All models were trained on MI300X GPUs.
Software Dependencies	Yes	For GPT-4o, we utilized the 2024-07-01 version hosted on the Azure platform, with inference parameters outlined in Table 9.
Experiment Setup	Yes	Table 8 and 9 show the hyper-parameters for training MVo T and doing inference with GPT-4o. ... Table 8. Hyper-parameters of fine-tuning Anole 7B for different system variants. Random Seed 42 Epochs 40 Learning Rate 0.0002 Train Batch Size 4 Val Batch Size 16 8 Grad Accumulation 4 2 GPUs 8 32