SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Authors: Wujiang Xu, Qitian Wu, Zujie Liang, Jiaojiao Han, Xuying Ning, Yunxiao Shi, Wenfang Lin, Yongfeng Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experimental results illustrate that the proposed SLMREC model attains the best performance using only 13% of the parameters found in LLM-based recommendation models, while simultaneously achieving up to 6.6x and 8.0x speedups in training and inference time costs, respectively. Besides, we provide a theoretical justification for why small language models can perform comparably to large language models in SR.
Researcher Affiliation Collaboration Wujiang Xu1, Qitian Wu2, Zujie Liang3, Jiaojiao Han4, Xuying Ning5, Yunxiao Shi6, Wenfang Lin3, Yongfeng Zhang1 1 Rutgers University 2 Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard 3 Ant Group 4 Dian Diagnostics Group Co. 5 University of Illinois Urbana-Champaign 6 University of Technology Sydney
Pseudocode No The paper describes its methodology through mathematical equations and textual explanations, but it does not contain a distinct section, figure, or block explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The source code and datasets are available at the URL 1. 1https://github.com/Wujiang Xu/SLMRec
Open Datasets Yes To obtain large-scale industry data, we use the Amazon 18 version3 dataset in this paper. More details are shown in Section 5. 3https://nijianmo.github.io/amazon/index.html
Dataset Splits Yes The historical sequence of interactions for each user is divided into three segments: (1) the most recent interaction is reserved for testing, (2) the second most recent for validation, and (3) all preceding interactions are used for training. [...] In order to ensure an unbiased evaluation, we adopt the methodology employed in previous works (Krichene & Rendle, 2020; Zhao et al., 2020), wherein we randomly select 999 negative items (i.e., items that the user has not interacted with) and combine them with 1 positive item (i.e., a ground-truth interaction) to form our recommendation candidates for the ranking test.
Hardware Specification Yes We use mixed precision training and train on 1*80G Nvidia A100 GPU.
Software Dependencies No Our implementation is based on Huggingface Transformers 6. The paper mentions "Huggingface Transformers" but does not specify a version number for this or any other software dependency.
Experiment Setup Yes In Table 6, we provide hyper-parameters in our training stage. Our implementation is based on Huggingface Transformers 6. The input and intermediate hidden dimension in the feed-forward network is 4096. We use mixed precision training and train on 1*80G Nvidia A100 GPU.