reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation

Authors: Ziyan Wang, Yingpeng Du, Zhu Sun, Haoyan Chua, Kaidong Feng, Wenya Wang, Jie Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we conduct experiments on real-world datasets and demonstrate the superiority of our Re2LLM over state-of-the-art methods. Our experiments demonstrate that Re2LLM outperforms state-of-the-art methods, including deep learning-based and LLM-based models, in both few-shot and full-data settings across two real-world datasets.
Researcher Affiliation	Academia	1Nanyang Technological University 2Singapore University of Technology and Design 3Yanshan University
Pseudocode	Yes	The overall pseudocode of Re2LLM is in Appendix B.
Open Source Code	Yes	Our code and data are available in the Supplementary Material.
Open Datasets	Yes	We evaluate Re2LLM and baselines on two real-world datasets. Movie (Hetrec2011-Movielens) contains user ratings of movies and side information such as title, production year, and genre. Game ( Video Games of the Amazon Review Dataset) contains users reviews on various types of games and peripherals, and metadata such as title, brand, and tag. The statistics are in Table 1.
Dataset Splits	Yes	For each dataset, we apply the split-by-ratio strategy following (Sun et al. 2020) to obtain training, validation, and test sets by 7:1:2.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or detailed computer specifications. It mentions using the gpt-4 API and discusses computational requirements in an appendix not provided.
Software Dependencies	No	The paper mentions using Optuna for hyperparameter optimization and BERT as a text encoder, but does not provide specific version numbers for these or any other key software libraries or frameworks. It also uses the 'gpt-4 API' which is a service rather than a software dependency with a version.
Experiment Setup	Yes	We conduct 20 trials to search for learning rate, weight decay, and batch size. For our method Re2LLM, we set the knowledge base size to 20, and the few-shot training size to 500. For the full dataset setting, we use the entire training set. For the few-shot setting, we sample 500 training samples from the entire training set for all methods. We run 5 experiments with different random seeds and show the average performance. All methods are optimized by the Adam optimizer.