MTSTRec: Multimodal Time-Aligned Shared Token Recommender

Authors: Ming-Yi Hong, Yen-Jung Hsu, Miao-Chen Chiang, Che Lin

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that MTSTRec achieves state-of-the-art performance across multiple sequential recommendation benchmarks, significantly improving upon existing multimodal fusion. Our code is available at https://github.com/idssplab/ MTSTRec.
Researcher Affiliation Academia 1Data Science Degree Program, National Taiwan University and Academia Sinica, Taiwan 2Graduate Institute of Communication Engineering, National Taiwan University, Taiwan 3Department of Electrical Engineering, National Taiwan University, Taiwan. Correspondence to: Che Lin <EMAIL>.
Pseudocode No The paper describes methods in text and uses mathematical formulas, but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code Yes Our code is available at https://github.com/idssplab/ MTSTRec.
Open Datasets Yes Our experiments utilize three datasets: two proprietary datasets from Avivi D Innovative Multimedia Food E-commerce and House-Hold E-commerce, which have already been made publicly available1, and one public dataset from H&M. ... 1The datasets are available at https://github.com/ idssplab/MTSTRec. For more details regarding the dataset release, please refer to Appendix O.
Dataset Splits Yes The data is split chronologically into 75% for training, 12.5% for validation, and 12.5% for testing based on the purchase orders. For the H&M (Trousers) dataset, which contains only purchase actions, items are sorted by purchase time, and those bought on the last day are used as the answer set, ensuring consistency across all datasets (Meng et al., 2020).
Hardware Specification No The paper mentions that computational resources were provided by the National Center for High-performance Computing (NCHC), but does not specify the exact hardware models (e.g., specific GPUs, CPUs, or memory).
Software Dependencies No The paper mentions the use of various models and APIs (Llama 3.1, GPT-4o-mini, VGG-19, BERT), but does not provide specific version numbers for general software dependencies or libraries (e.g., Python version, PyTorch/TensorFlow version).
Experiment Setup Yes In our experiments, we tuned the hyperparameters based on validation data to ensure optimal performance. The batch size was uniformly set to 64 for all models, and the input dimension d was fixed at 512. We employed the Adam W optimizer while the maximum sequence length N was set to 20. The fusion layers were standardized across models, with Lfusion = 3 and a dropout rate of 0.1. ... The number of each encoder layer (Lmod) was tested across values of {2, 4, 8}, and the number of attention heads across {1, 2, 4, 8, 16}. We also experimented with dropout rates of {0.1, 0.2, 0.3} in the hidden layers. The learning rate was tested across a range of {0.001, 0.0005, 0.0001, 0.00005, 0.00001}, while the L2 regularization penalty was tuned from {0.0001, 0.00005, 0.00001, 0.000005, 0.000001}. A gamma value of {0.9, 0.75, 0.5}was set for learning rate decay.