Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model
Authors: Chuyuan Wei, Ke Duan, Shengda Zhuo, Hongchun Wang, Shuqiang Huang, Jie Liu
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental validation on two real-world datasets demonstrates the efficacy of our approach, significantly enhancing both the ac- curacy and robustness of recommendations compared to state-of-the-art methods. |
| Researcher Affiliation | Academia | Chuyuan Wei EMAIL College of Electrical and Information Engineering Beijing University of Civil Engineering and Architecture Beijing, China Ke Duan EMAIL College of Mechanical-Electronic and Vehicle Engineering Beijing University of Civil Engineering and Architecture Beijing, China Shengda Zhuo EMAIL (Corresponding Author) College of Cyber Security Jinan University Guangzhou, Guangdong, China Hongchun Wang EMAIL College of Urban Economics and Management Beijing University of Civil Engineering and Architecture Beijing, China Shuqiang Huang EMAIL (Corresponding Author) College of Cyber Security Jinan University Guangzhou, Guangdong, China Jie Liu EMAIL North China University of Technology Beijing, China |
| Pseudocode | Yes | Algorithm 1 Training Procedure of ER2ALM Input: Item set I, User set U, Historical interaction set HU Output: Top-k Recommendations |
| Open Source Code | No | This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. The Adam W optimizer (Paszke et al., 2019) was employed for training, with learning rates ranging from [5 10 5, 1 10 3] for the Netflix dataset and [2.5 10 4, 9.5 10 4] for the Movie Lens dataset. For the LLMs parameters, the temperature was selected from {0.4, 0.8, 1}, aiming to balance the accuracy and richness of the generated content. The top-p value, used to control generation precision, was chosen from {0.6, 0.8, 1}. To maintain response integrity, data flow was disabled. For embedding generation, we utilized a 1024-dimensional Ro BERTa model to capture more detailed content. For noise reduction, the threshold was set to 0.4, with similarity judgments for distress added once the number of trusted embeddings reached 500. |
| Open Datasets | Yes | We conduct experiments using publicly available datasets, Netflix and Movie Lens10M (ML-10M), both of which contain basic information about the movies. Netflix1 released by Netflix, contains over 100 million anonymous movie ratings collected from users 1. https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data Movie Lens is a widely used series of benchmark datasets in recommendation system tasks. ML-10M2 contains 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users. 2. https://grouplens.org/datasets/movielens/10m/ |
| Dataset Splits | No | To mitigate potential biases in the test sampling process, we adopt the all-ranking evaluation strategy (Wei et al., 2021a, 2020). |
| Hardware Specification | No | This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. |
| Software Dependencies | Yes | This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. 3. https://huggingface.co/THUDM/chatglm3-6b |
| Experiment Setup | Yes | In this part, we provide a concise overview of the general experimental setup, including details on the datasets, evaluation protocols, comparative baselines, and implementation specifics. Implementation Details. This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. The Adam W optimizer (Paszke et al., 2019) was employed for training, with learning rates ranging from [5 10 5, 1 10 3] for the Netflix dataset and [2.5 10 4, 9.5 10 4] for the Movie Lens dataset. For the LLMs parameters, the temperature was selected from {0.4, 0.8, 1}, aiming to balance the accuracy and richness of the generated content. The top-p value, used to control generation precision, was chosen from {0.6, 0.8, 1}. To maintain response integrity, data flow was disabled. For embedding generation, we utilized a 1024-dimensional Ro BERTa model to capture more detailed content. For noise reduction, the threshold was set to 0.4, with similarity judgments for distress added once the number of trusted embeddings reached 500. |