reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CoT4Rec: Revealing User Preferences Through Chain of Thought for Recommender Systems

Authors: Weiqi Yue, Yuyu Yin, Xin Zhang, Binbin Shi, Tingting Liang, Jian Wan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Co T4Rec demonstrates superior performance over existing state-of-the-art models in recommendation tasks across four public datasets, achieving improvements ranging from 2.2% to 12.2%. ... Experiment Experimental Setup Datasets Baselines Evaluation Metrics Implementation Details Overall Performance Ablation Study Further Analysis
Researcher Affiliation	Academia	Hangzhou Dianzi University, Hangzhou, China EMAIL
Pseudocode	No	The paper describes the Co T reasoning strategy and recommendation model architecture using figures (Figure 2, Figure 3) and textual descriptions, but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code	Yes	Code https://github.com/815382636/Co T4Rec
Open Datasets	Yes	To evaluate the effectiveness of Co T4Rec, comprehensive experiments are conducted on four public datasets from various sources: 1) Movie Lens1: Movie Lens is a widely used public dataset focusing on movie ratings, with data sourced provided by users from IMDB and movie databases. ML100K and ML1M, two benchmark datasets are selected, each containing approximately 100,000 and 1,000,000 user interactions, respectively. 2) Amazon2: The Amazon dataset includes user reviews and ratings from the Amazon platform across 24 different categories. Movies and Electronics , the two million-level categories are chosen as the benchmark datasets. ... 1https://grouplens.org/datasets/movielens 2http://jmcauley.ucsd.edu/data/amazon
Dataset Splits	Yes	Following previous works (Shi et al. 2020), we conduct data preprocessing on four datasets, and apply the leave-one-out strategy for evaluation. Specifically, for each user s historical behavior sequence, the most recent item and the second most recent item are used as the test data and validation data, respectively. The remaining historical behavior sequences are used for training. For each correct interaction item in the validation and test sets, 100 unrelated items are randomly selected and shuffled with it.
Hardware Specification	Yes	The experiments are conducted on 2 NVIDIA V6000 48G GPUs.
Software Dependencies	No	The paper mentions specific models like GPT-3.5-turbo-1106 and flan-t5-base, and optimizers like Adam W, but it does not provide specific version numbers for software libraries (e.g., PyTorch, TensorFlow) or other general software dependencies used for the implementation.
Experiment Setup	Yes	For Co T Reasoning Strategy, we employ GPT-3.5-turbo-1106 as the interface for LLM calls. For data clustering, we employ the sliding window to segment user s historical interactions into user fragments. The sliding window size is set to 5, with a stride of 3 for each slide. Then, we apply k-means algorithm to cluster these fragments into 6 clusters. ... We employ the optimizer Adam W (Loshchilov, Hutter et al. 2017) to adaptively adjust the model, setting the maximum number of epochs to 10, the learning rate to 5e-5, and the weight decay to 0.01.