SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Authors: Wujiang Xu, Qitian Wu, Zujie Liang, Jiaojiao Han, Xuying Ning, Yunxiao Shi, Wenfang Lin, Yongfeng Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experimental results illustrate that the proposed SLMREC model attains the best performance using only 13% of the parameters found in LLM-based recommendation models, while simultaneously achieving up to 6.6x and 8.0x speedups in training and inference time costs, respectively. Besides, we provide a theoretical justification for why small language models can perform comparably to large language models in SR. |
| Researcher Affiliation | Collaboration | Wujiang Xu1, Qitian Wu2, Zujie Liang3, Jiaojiao Han4, Xuying Ning5, Yunxiao Shi6, Wenfang Lin3, Yongfeng Zhang1 1 Rutgers University 2 Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard 3 Ant Group 4 Dian Diagnostics Group Co. 5 University of Illinois Urbana-Champaign 6 University of Technology Sydney |
| Pseudocode | No | The paper describes its methodology through mathematical equations and textual explanations, but it does not contain a distinct section, figure, or block explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The source code and datasets are available at the URL 1. 1https://github.com/Wujiang Xu/SLMRec |
| Open Datasets | Yes | To obtain large-scale industry data, we use the Amazon 18 version3 dataset in this paper. More details are shown in Section 5. 3https://nijianmo.github.io/amazon/index.html |
| Dataset Splits | Yes | The historical sequence of interactions for each user is divided into three segments: (1) the most recent interaction is reserved for testing, (2) the second most recent for validation, and (3) all preceding interactions are used for training. [...] In order to ensure an unbiased evaluation, we adopt the methodology employed in previous works (Krichene & Rendle, 2020; Zhao et al., 2020), wherein we randomly select 999 negative items (i.e., items that the user has not interacted with) and combine them with 1 positive item (i.e., a ground-truth interaction) to form our recommendation candidates for the ranking test. |
| Hardware Specification | Yes | We use mixed precision training and train on 1*80G Nvidia A100 GPU. |
| Software Dependencies | No | Our implementation is based on Huggingface Transformers 6. The paper mentions "Huggingface Transformers" but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | In Table 6, we provide hyper-parameters in our training stage. Our implementation is based on Huggingface Transformers 6. The input and intermediate hidden dimension in the feed-forward network is 4096. We use mixed precision training and train on 1*80G Nvidia A100 GPU. |