reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Authors: Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, Hui Xiong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations across various datasets validate the effectiveness of MLLM-MSR, showcasing its superior ability to capture and adapt to the evolving dynamics of user preferences. [...] Experiments In this section, we detail the comprehensive experiment to validate the effectiveness of our proposed Multimodal Large Language Model for Sequential Multimodal Recommendation (MLLM-MSR).
Researcher Affiliation	Collaboration	1Department of Management Science and Information Systems, Rutgers University 2School of Data Science, University of Science and Technology of China 3Department of Applied Mathematics and Computational Science, University of Pennsylvania 4Bytedance Inc. 5School of Computer Science, Georgia Institute of Technology 6Department of Computer Science, University of Pittsburgh 7Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou)
Pseudocode	No	The paper describes the method using figures (Figure 1, Figure 2, Figure 3) and prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Yuyang Ye/MLLM-MSR
Open Datasets	Yes	Our experimental evaluation utilized three open-source, real-world datasets from diverse recommendation system domains. These datasets include the Microlens Dataset (Ni et al. 2023), featuring user-item interactions, video introductions, and video cover images; the Amazon-Baby Dataset; and the Amazon-Game Dataset (He and Mc Auley 2016; Mc Auley et al. 2015), all of them contain user-item interactions, product descriptions, and images.
Dataset Splits	Yes	Additionally, we implemented a 1:1 ratio for negative sampling during training and a 1:20 ratio for evaluation. Further details on these datasets are provided in Table 2. [...] all results were obtained using 5-fold cross-validation and various random seeds, and achieved a 95% confidence level.
Hardware Specification	Yes	Our experiments were performed on a Linux server equipped with eight A800 80GB GPUs.
Software Dependencies	Yes	We utilized Llava-v1.6-mistral-7b for image description and recommendation tasks, and Llama3-8b-instruct 2 for summarizing user preferences. For the Supervised Fine Tuning (SFT) process, we employed the Py Torch Lightning library, using Lo RA with a rank of 8. The optimization was handled by the Adam W optimizer with a learning rate of 2e5 and a batch size of 1, setting gradient accumulation steps at 8 and epochs at 10. For distributed training, we implemented Deepspeed [28] with Ze RO stage 2.
Experiment Setup	Yes	The optimization was handled by the Adam W optimizer with a learning rate of 2e5 and a batch size of 1, setting gradient accumulation steps at 8 and epochs at 10.