reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards S²-Challenges Underlying LLM-Based Augmentation for Personalized News Recommendation

Authors: Shicheng Wang, Hengzhu Tang, Li Gao, Shu Guo, Suqi Cheng, Junfeng Wang, Dawei Yin, Tingwen Liu, Lihong Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on two realworld news recommendation datasets MIND-Large, MIND-Small and empirical results demonstrate the effectiveness of our approach from multiple perspectives. Experiment This section conducts experiments to evaluate the performance of our model, S2LENR. We conduct extensive experiments on two large-scale realworld datasets, MIND-Large 1 and MIND-Small 2, to evaluate the effectiveness of our method. The overall performance results are displayed in Table 2 and Table 3 respectively, where the best results are in bold and the second best are underlined. Ablation Study In this section, we conduct elaborate ablation experiments on MIND-Small dataset, in order to further evaluate the effectiveness of all innovative components in our method.
Researcher Affiliation	Collaboration	Shicheng Wang1,2, Hengzhu Tang3, Li Gao3*, Shu Guo4 , Suqi Cheng3, Junfeng Wang3, Dawei Yin3, Tingwen Liu1,2, Lihong Wang4 1 Institute of Information Engineering, Chinese Academy of Sciences 2 School of Cyber Security, University of Chinese Academy of Sciences 3 Baidu Inc. 4 National Computer Network Emergency Response Technical Team/Coordination Center EMAIL , EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using mathematical equations and natural language, but does not contain any clearly labeled pseudocode or algorithm blocks. For example, the detailed steps for 'News Encoder' and 'User Encoder' are presented through equations (1) to (4) and accompanying textual descriptions.
Open Source Code	No	The paper does not provide a direct link to a source code repository or an explicit statement indicating that the code for the described methodology (S2LENR) is publicly available. The links provided are for datasets or a third-party API (GPT-4).
Open Datasets	Yes	We conduct extensive experiments on two large-scale realworld datasets, MIND-Large 1 and MIND-Small 2, to evaluate the effectiveness of our method. MIND-Large dataset collected from Microsoft News platform contains two record documents. ... 1https://msnews.github.io/ 2A small version of the MIND-Large dataset by randomly sampling 50,000 users and their behavior logs.
Dataset Splits	Yes	The click behaviors in the first four weeks are regarded as user reading history, the behaviors in the penultimate week is applied for training, and the data in last week is used for performance evaluation. ... Denoting clicked candidate news in the training set as positive sample ni, i.e., yu,i = 1, then we randomly choose P non-clicked candidate news as negative samples [ni,1, , ni,P ] from the same impression displayed to the target user. ... the negative sampling ratio is 4.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as particular GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions using 'pre-trained Glove embeddings (Pennington, Socher, and Manning 2014)' and 'Adam optimizer (Kingma and Ba 2014)' for training, as well as calling 'GPT-4 API'. However, it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow versions, or specific GPT-4 API version).
Experiment Setup	Yes	For news content modeling, we utilize the first 30 words of news titles to learn corresponding news representations. For user interest modeling, we treat the recent 50 clicked news as users reading history. Moreover, news and user representations, as well as latent embeddings are both 400-dimensional vectors, i.e., d = 400. For hyper-parameters, the number of generated news #NUM in prompt π is 5, the number of prototypes K is set to 1000, the joint learning weight α is set to 0.1, and the negative sampling ratio is 4. In addition, we utilize dropout technique and Adam optimizer (Kingma and Ba 2014) for training. The dropout rate and learning rate are 0.1 and 0.001 respectively.