reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph4MM: Weaving Multimodal Learning with Structural Information

Authors: Xuying Ning, Dongqi Fu, Tianxin Wei, Wujiang Xu, Jingrui He

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both generative and discriminative tasks show that Graph4MM outperforms larger VLMs, LLMs, and multimodal graph baselines, achieving a 6.93% average improvement.
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign 2Meta AI 3Rutgers University.
Pseudocode	No	The paper describes the proposed Graph4MM framework using natural language and mathematical equations (e.g., Section 3, Equations 1-13) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https: //github.com/Yenn Ning/Graph4MM.
Open Datasets	Yes	For the generative task, we use WIKIWEB2M (Burns et al., 2023)... For the discriminative task, we use ELE-FASHION (Zhu et al., 2024b)...
Dataset Splits	Yes	Due to storage constraints, we randomly sample 10K Wikipedia pages, resulting in 13,539 section summary samples for training and 1,768 for testing. ... We sample 10k positive and negative node pairs and use an 8:1:1 train/val/test split.
Hardware Specification	Yes	All experiments were conducted on computing nodes equipped with 2 NVIDIA A100 or 2 NVIDIA Ada A6000 GPUs.
Software Dependencies	No	The paper mentions software components like 'CLIP' for the vision encoder and 'Prefix-Tuning' for OPT-125M and 'LoRA' for LLaMA-1B, but does not provide specific version numbers for these or any other ancillary software dependencies such as programming languages or core deep learning libraries.
Experiment Setup	Yes	Table 5. Hyperparameter settings for generative and discriminative tasks. This table includes details such as Learning Rate (1e-4), Max Input Length (1024/512), Max Output Length (128/32), Batch Size (2), Gradient Accumulation Steps (16), LoRA Rank (64), Prefix Tuning Virtual Tokens (20), Attention Diffusion Steps (2), Number of MM-QFormer Block (1), Attention Diffusion α (0.1), Number of Attention Heads (8), and Training Epochs (50 for OPT-125M, 3 for LLaMA-1B).