reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LLM-Powered User Simulator for Recommender System

Authors: Zijian Zhang, Shuchang Liu, Ziru Liu, Rui Zhong, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Qidong Liu, Peng Jiang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We establish quantifying and qualifying experiments on five datasets to validate the simulator s effectiveness and stability across various recommendation scenarios.
Researcher Affiliation	Collaboration	1 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University 2 Kuaishou Technology 3 City University of Hong Kong 4 Xi an Jiaotong University
Pseudocode	No	The paper describes the methodology in narrative text and using prompt templates, but does not include any clearly labeled pseudocode or algorithm blocks for its models or procedures.
Open Source Code	Yes	Code https://github.com/Applied-Machine-Learning Lab/LLM User Simulator
Open Datasets	Yes	To verify the efficacy of the proposed ensemble user simulator, we incorporate datasets from diverse fields: Yelp1 (the state of Missouri), Amazon2 (Digital Music, Video Games, and Movies), and Anime3. Dataset statistics are shown in Table ??. (...) 1https://www.yelp.com/dataset/documentation/main 2https://nijianmo.github.io/amazon/index.html 3https://www.kaggle.com/datasets/Cooper Union/animerecommendations-database
Dataset Splits	No	The paper uses various public datasets (Yelp, Amazon Music, Amazon Games, Amazon Movie, Anime) and converts ratings to binary format. However, it does not explicitly state the specific training, validation, or test dataset splits (e.g., percentages, sample counts, or methodology for splitting) used for the experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments. It only mentions using Chat GLM-6B as the LLM.
Software Dependencies	No	The paper mentions several models and algorithms used (e.g., Chat GLM-6B, BERT, SASRec, A2C, DQN, PPO, TRPO) but does not provide specific version numbers for these software components or any programming languages/libraries used for their implementation.
Experiment Setup	No	The paper evaluates various reinforcement learning algorithms but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or other training configurations.