reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation

Authors: Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan, Zheng Hui, Jiawei Yao

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate the effectiveness of our method for complex reasoning tasks. Our experiments were performed on a single 4090D GPU.
Researcher Affiliation	Academia	1Shanghai Jiaotong University 2 Institute of Automation, Chinese Academy of Sciences 3 Beijing Jiaotong University 4 Columbia University 5 University of Washington
Pseudocode	No	The paper describes algorithms (Simignore algorithm, image-text token filtering algorithm) and their steps in paragraph form, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code https://github.com/Fanshuo Zeng/Simignore
Open Datasets	Yes	The Science QA (Lu et al. 2022) dataset is currently the only dataset available for complex reasoning and contains 21,208 Q&A multiple-choice questions from elementary and middle school science curricula.
Dataset Splits	No	The paper mentions using the Science QA dataset but does not provide specific details on training, validation, or test splits. It only states the dataset contains '21,208 Q&A multiple-choice questions'.
Hardware Specification	Yes	Our experiments were performed on a single 4090D GPU.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	Our experiments were performed on a single 4090D GPU. ... As shown in Table 1, we show the results for the different baseline models as well as for the models to which our method is applied. ... Table 2: Accuracy and runtime of LLM when ignoring different numbers of image tokens(baseline: LLa VA1.5-7B).