reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RecFlow: An Industrial Full Flow Recommendation Dataset

Authors: Qi Liu, Kai Zheng, Rui Huang, Wuchao Li, Kuo Cai, Yuan Chai, Yanan Niu, Yiqun Hui, Bing Han, Na Mou, Hongning Wang, Wentian Bao, Yun Yu, Guorui Zhou, Han Li, Yang Song, Defu Lian, Kun Gai

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Leveraging Rec Flow, we conduct extensive experiments to demonstrate its potential in designing novel algorithms that enhance effectiveness by incorporating stage-specific samples. ...We run all experiments five times with Pytorch (Imambi et al., 2021) on Nvidia 32G V100. We report the average result and standard deviation. ...We utilize the standard top-N ranking metrics, including hit Recall@K and NDCG@K.
Researcher Affiliation	Collaboration	1University of Science and Technology of China 2Kuaishou 3Independent 4Tsinghua University EMAIL, {liandefu}@ustc.edu.cn EMAIL EMAIL {hw5x}@virginia.edu, {wb2328}@columbia.edu {yuenyun}@126.com, EMAIL
Pseudocode	No	The paper describes methods and algorithms in paragraph text and mathematical equations but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We have made the dataset and source codes publicly available to promote reproducibility and advance RS research at https://github.com/Rec Flow-ICLR/Rec Flow.
Open Datasets	Yes	We propose Rec Flow, an industrial large-scale full-flow dataset collected from the real industrial RS. ...We have made the dataset and source codes publicly available to promote reproducibility and advance RS research at https://github.com/Rec Flow-ICLR/Rec Flow. The dataset is licensed under CC-BY-NC-SA-4.0 International License.
Dataset Splits	Yes	To keep consistency with the real industrial RS’s online learning mode, we train SASRec with the first 36 days data day by day. The data from the last day is for evaluation.
Hardware Specification	Yes	We run all experiments five times with Pytorch (Imambi et al., 2021) on Nvidia 32G V100.
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify a version number for the software used in the experiments. It cites 'Imambi et al., 2021', which refers to a book on PyTorch, not a specific version of the library itself.
Experiment Setup	Yes	The Multi-layer Perceptron (MLP) of the user and video towers in DSSM are set to be [128, 64, 32]. ... The batch size is 4, 096 and the learning rate is 1e 1. BPR (Rendle et al., 2012) is the loss function, and Adam (Kingma & Ba, 2014) is used for optimization.