reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Inference for Large Language Model-based Generative Recommendation

Authors: Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on two real-world datasets demonstrate that At Speed significantly accelerates LLM-based generative recommendation, e.g., near 2 speedup under strict top-K verification and up to 2.5 speedup under relaxed sampling verification. We conduct extensive experiments using both verification strategies on two real-world recommendation datasets, demonstrating that At Speed significantly accelerates the decoding for LLM-based recommendation (around 2 speedup).
Researcher Affiliation	Academia	1National University of Singapore 2Tsinghua University 3University of Science and Technology of China 4The Hong Kong Polytechnic University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 SD step with Top-K Strict Verification Algorithm 2 SD step with Relaxed Sampling Verification
Open Source Code	Yes	The codes and datasets are available at https://github.com/Linxyhaha/At Speed.
Open Datasets	Yes	To evaluate our proposed framework, we instantiate At Speed on a SOTA LLM-based generative recommender model LC-Rec (Zheng et al., 2024) and test on two real-world recommendation datasets6 from the popular benchmark Amazon review datasets7. 1) Beauty contains user interactions with the beauty products and 2) Games collects the user interactions with the video games. Footnote 7: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/.
Dataset Splits	Yes	For both Beauty and Games, all interactions are sorted according to the global timestamps, and then split into training, validation, and testing sets with the ratio of 8:1:1.
Hardware Specification	Yes	Figure 1: (a) The inference time costs of LC-Rec (Zheng et al., 2024) with LLa MA-7B on a single A5000 GPU. We train the draft model for 20 epochs on 4 NVIDIA RTX A5000 GPUs.
Software Dependencies	No	The paper mentions using Adam W optimizer, LLa MA-7B, LLa MA-68M, and Lo RA fine-tuning technique, but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	For draft model training, we use Adam W optimizer with batch size= 64, learning rate=0.001, and a cosine scheduler with warmup step of 200 to adjust the learning rate. We train the draft model for 20 epochs on 4 NVIDIA RTX A5000 GPUs. Meanwhile, we search the alignment strength α in {0.1, 0.3, 0.5, 0.7} and weight decay in {0.01, 0.1}. We set draft length γ = 4, number of recommended items K = {1, 3, 5, 10, 20}, and draft beam size N = 40.