CoT4Rec: Revealing User Preferences Through Chain of Thought for Recommender Systems

Authors: Weiqi Yue, Yuyu Yin, Xin Zhang, Binbin Shi, Tingting Liang, Jian Wan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Co T4Rec demonstrates superior performance over existing state-of-the-art models in recommendation tasks across four public datasets, achieving improvements ranging from 2.2% to 12.2%. ... Experiment Experimental Setup Datasets Baselines Evaluation Metrics Implementation Details Overall Performance Ablation Study Further Analysis
Researcher Affiliation Academia Hangzhou Dianzi University, Hangzhou, China EMAIL
Pseudocode No The paper describes the Co T reasoning strategy and recommendation model architecture using figures (Figure 2, Figure 3) and textual descriptions, but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code Yes Code https://github.com/815382636/Co T4Rec
Open Datasets Yes To evaluate the effectiveness of Co T4Rec, comprehensive experiments are conducted on four public datasets from various sources: 1) Movie Lens1: Movie Lens is a widely used public dataset focusing on movie ratings, with data sourced provided by users from IMDB and movie databases. ML100K and ML1M, two benchmark datasets are selected, each containing approximately 100,000 and 1,000,000 user interactions, respectively. 2) Amazon2: The Amazon dataset includes user reviews and ratings from the Amazon platform across 24 different categories. Movies and Electronics , the two million-level categories are chosen as the benchmark datasets. ... 1https://grouplens.org/datasets/movielens 2http://jmcauley.ucsd.edu/data/amazon
Dataset Splits Yes Following previous works (Shi et al. 2020), we conduct data preprocessing on four datasets, and apply the leave-one-out strategy for evaluation. Specifically, for each user s historical behavior sequence, the most recent item and the second most recent item are used as the test data and validation data, respectively. The remaining historical behavior sequences are used for training. For each correct interaction item in the validation and test sets, 100 unrelated items are randomly selected and shuffled with it.
Hardware Specification Yes The experiments are conducted on 2 NVIDIA V6000 48G GPUs.
Software Dependencies No The paper mentions specific models like GPT-3.5-turbo-1106 and flan-t5-base, and optimizers like Adam W, but it does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or other general software dependencies used for the implementation.
Experiment Setup Yes For Co T Reasoning Strategy, we employ GPT-3.5-turbo-1106 as the interface for LLM calls. For data clustering, we employ the sliding window to segment user s historical interactions into user fragments. The sliding window size is set to 5, with a stride of 3 for each slide. Then, we apply k-means algorithm to cluster these fragments into 6 clusters. ... We employ the optimizer Adam W (Loshchilov, Hutter et al. 2017) to adaptively adjust the model, setting the maximum number of epochs to 10, the learning rate to 5e-5, and the weight decay to 0.01.