reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens

Authors: Ting-Ji Huang, Jia-Qi Yang, Chunxu Shen, Kai-Qi Liu, De-Chuan Zhan, Han-Jia Ye

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate META ID on five downstream recommendation tasks: sequential recommendation, direct recommendation, rating prediction, explanation generation, and review related tasks. We analyze the influence of critical components in META ID and assess the ID representations through visualization and our proposed metrics. Table 1 presents our findings for sequential recommendation task.
Researcher Affiliation	Collaboration	1School of Artificial Intelligence, Nanjing University 2National Key Laboratory for Novel Software Technology, Nanjing University 3We Chat Technical Architecture Department, Tencent Inc. 4Software Institute, Nanjing University.
Pseudocode	No	The paper describes the methodology, including meta-path-based embedding and OOV token generation, through descriptive text and diagrams, but does not include a formally structured pseudocode block or algorithm.
Open Source Code	Yes	Code is available at https: //github.com/Tingji2419/META-ID.
Open Datasets	Yes	We evaluate our META ID framework on three public real-world datasets from the Amazon Product Reviews dataset (Ni et al., 2019), focusing specifically on Sports, Beauty, and Toys. The datasets are processed following the methodology in P5 (Geng et al., 2022). 2https://nijianmo.github.io/amazon
Dataset Splits	Yes	For rating, explanation, and review task families, we randomly split each dataset into training (80%), validation (10%) and testing (10%) sets, and ensure that there is at least one instance included in the training set for each user and item.
Hardware Specification	Yes	For LLM fine-tuning, we pre-train T5 for 10 epochs using Adam W optimizer on two NVIDIA RTX 3090 GPUs with a batch size of 64... We use the lora (Hu et al., 2022) technique to fine-tune the token embedding layer and linear head layer of LLa MA2-7b for 1 epochs using Adam W optimizer on two NVIDIA RTX A6000 GPUs...
Software Dependencies	No	The paper mentions specific LLM models like T5 (Raffel et al., 2020b) and LLa MA2-7b (Touvron et al., 2023), and techniques like LoRA (Hu et al., 2022), but does not specify version numbers for underlying programming languages or libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	For LLM fine-tuning, we pre-train T5 for 10 epochs using Adam W optimizer on two NVIDIA RTX 3090 GPUs with a batch size of 64, a peak learning rate of 1e 3. We apply warm-up for the first 5% of all training steps to adjust the learning rate, a maximum input token length of 1024. We use the lora (Hu et al., 2022) technique to fine-tune the token embedding layer and linear head layer of LLa MA2-7b for 1 epochs using Adam W optimizer on two NVIDIA RTX A6000 GPUs with a batch size of 28, a peak learning rate of 1e 5, the lora attention dimension of 16 and the alpha parameter of 32.