reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unbiased Recommender Learning from Implicit Feedback via Weakly Supervised Learning

Authors: Hao Wang, Zhichao Chen, Haotian Wang, Yanchao Tan, Licheng Pan, Tianqiao Liu, Xu Chen, Haoxuan Li, Zhouchen Lin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on three real-world datasets validate the efficacy of Weakly Rec in terms of improved recommendation quality. Code is available at https: //github.com/Howard ZJU/weakrec. ... Extensive evaluations on three publicly available real-world datasets substantiate the efficacy of the proposed method.
Researcher Affiliation	Collaboration	1Zhejiang University, 2National University of Defense Technology 3TAL Education Group 4Renmin University of China 5Center for Data Science, Peking University 6State Key Lab of General AI, School of Intelligence Science and Technology, Peking University 7Institute for Artificial Intelligence, Peking University 8Pazhou Laboratory (Huangpu), Guangzhou, China.
Pseudocode	Yes	Algorithm 1 The computation workflow of Weakly Rec.
Open Source Code	Yes	Code is available at https: //github.com/Howard ZJU/weakrec.
Open Datasets	Yes	We utilize three real-world datasets: Yahoo! R3, Coat, and Kuai Rec. These datasets are selected since they uniquely provide negative feedback for evaluating model performance and include unbiased test sets that simulate production environments, aligning with established Implicit Rec studies (Saito et al., 2020; Saito, 2020; Ren et al., 2023). In Yahoo! R3 and Coat, user-item pairs with ratings above 4 are labeled as positive, and the rest as negative. In Kuai Rec, records with counts below two are considered negative, while the others are positive.
Dataset Splits	Yes	Each dataset is chronologically split into training, validation, and testing sets in a ratio of 0.8:0.1:0.1.
Hardware Specification	Yes	Experiments are conducted with two Intel Xeon Platinum 8383C CPUs (2.70 GHz) and eight NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies	No	All experiments are implemented in Py Torch using the Adam optimizer (Kingma and Ba, 2015) with early stopping (patience = 5).
Experiment Setup	Yes	We tune the learning rate in {0.005, 0.01, 0.05}, batch size in {256, 512, 1024, 2048}, and embedding size in {8, 16, 32}. All experiments are implemented in Py Torch using the Adam optimizer (Kingma and Ba, 2015) with early stopping (patience = 5). The candidate mass weights (wk) are set to [5, 10, 20, 50, 100] which balances searching spectrum and efficiency.