Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model

Authors: Chuyuan Wei, Ke Duan, Shengda Zhuo, Hongchun Wang, Shuqiang Huang, Jie Liu

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental validation on two real-world datasets demonstrates the efficacy of our approach, significantly enhancing both the ac- curacy and robustness of recommendations compared to state-of-the-art methods.
Researcher Affiliation Academia Chuyuan Wei EMAIL College of Electrical and Information Engineering Beijing University of Civil Engineering and Architecture Beijing, China Ke Duan EMAIL College of Mechanical-Electronic and Vehicle Engineering Beijing University of Civil Engineering and Architecture Beijing, China Shengda Zhuo EMAIL (Corresponding Author) College of Cyber Security Jinan University Guangzhou, Guangdong, China Hongchun Wang EMAIL College of Urban Economics and Management Beijing University of Civil Engineering and Architecture Beijing, China Shuqiang Huang EMAIL (Corresponding Author) College of Cyber Security Jinan University Guangzhou, Guangdong, China Jie Liu EMAIL North China University of Technology Beijing, China
Pseudocode Yes Algorithm 1 Training Procedure of ER2ALM Input: Item set I, User set U, Historical interaction set HU Output: Top-k Recommendations
Open Source Code No This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. The Adam W optimizer (Paszke et al., 2019) was employed for training, with learning rates ranging from [5 10 5, 1 10 3] for the Netflix dataset and [2.5 10 4, 9.5 10 4] for the Movie Lens dataset. For the LLMs parameters, the temperature was selected from {0.4, 0.8, 1}, aiming to balance the accuracy and richness of the generated content. The top-p value, used to control generation precision, was chosen from {0.6, 0.8, 1}. To maintain response integrity, data flow was disabled. For embedding generation, we utilized a 1024-dimensional Ro BERTa model to capture more detailed content. For noise reduction, the threshold was set to 0.4, with similarity judgments for distress added once the number of trusted embeddings reached 500.
Open Datasets Yes We conduct experiments using publicly available datasets, Netflix and Movie Lens10M (ML-10M), both of which contain basic information about the movies. Netflix1 released by Netflix, contains over 100 million anonymous movie ratings collected from users 1. https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data Movie Lens is a widely used series of benchmark datasets in recommendation system tasks. ML-10M2 contains 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users. 2. https://grouplens.org/datasets/movielens/10m/
Dataset Splits No To mitigate potential biases in the test sampling process, we adopt the all-ranking evaluation strategy (Wei et al., 2021a, 2020).
Hardware Specification No This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs.
Software Dependencies Yes This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. 3. https://huggingface.co/THUDM/chatglm3-6b
Experiment Setup Yes In this part, we provide a concise overview of the general experimental setup, including details on the datasets, evaluation protocols, comparative baselines, and implementation specifics. Implementation Details. This study employs the locally deployed Chat GLM3-6B3 model to enhance data through LLM-generated dialogs. The Adam W optimizer (Paszke et al., 2019) was employed for training, with learning rates ranging from [5 10 5, 1 10 3] for the Netflix dataset and [2.5 10 4, 9.5 10 4] for the Movie Lens dataset. For the LLMs parameters, the temperature was selected from {0.4, 0.8, 1}, aiming to balance the accuracy and richness of the generated content. The top-p value, used to control generation precision, was chosen from {0.6, 0.8, 1}. To maintain response integrity, data flow was disabled. For embedding generation, we utilized a 1024-dimensional Ro BERTa model to capture more detailed content. For noise reduction, the threshold was set to 0.4, with similarity judgments for distress added once the number of trusted embeddings reached 500.