reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Long-Sequence Recommendation Models Need Decoupled Embeddings

Authors: Ningya Feng, Junwei Pan, Jialong Wu, Baixu Chen, Ximei Wang, QianLi, Xian Hu, Jie Jiang, Mingsheng Long

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and analysis demonstrate that DARE provides more accurate searches of correlated behaviors and outperforms baselines with AUC gains up to 9 on public datasets and notable improvements on Tencent s advertising platform.
Researcher Affiliation	Collaboration	Ningya Feng1 , Junwei Pan2 , Jialong Wu1 , Baixu Chen1, Ximei Wang2, Qian Li2, Xian Hu2, Jie Jiang2, Mingsheng Long1B 1School of Software, BNRist, Tsinghua University, China 2Tencent Inc, China
Pseudocode	No	The paper describes methods mathematically and in prose, but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code in Py Torch for experiments, including model analysis, is available at https://github.com/thuml/DARE.
Open Datasets	Yes	We use the publicly available Taobao (Zhu et al., 2018; 2019; Zhuo et al., 2020) and Tmall (Tianchi, 2018) datasets, which provide users behavior data over specific time periods on their platforms. ... Tianchi. Ijcai-15 repeat buyers prediction dataset, 2018. URL https://tianchi.aliyun. com/dataset/data Detail?data Id=42.
Dataset Splits	Yes	Training-validation-test split. We sequentially number history behaviors from one (the most recent behavior) to T (the most ancient behavior) according to the time step. The test dataset contains predictions of the first behaviors, while the second behaviors are used as the validation dataset. For the training dataset, we use the p3 5i, 0 ď i ď 18qth behavior. Models would finish predicting the jth behavior based on j 200 to j 1 behaviors (padding if history length is not long enough). Only users with behavior sequences longer than 210 will be reserved.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running its experiments.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup	Yes	A.1 HYPER-PARAMETERS AND MODEL DETAILS The hyper-parameters we use are listed as follows: Parameter Value Retrieve number 20 Epoch 2 Batch size 2048 Learning rate 0.01 Weight decay 1e-6 Besides, we use the Adam optimizer. Layers of the Multi-layer Perceptron (MLP) are set as 200 ˆ 80 ˆ 2, which is the same as Zhou et al. (2024). These settings remain the same in all our experiments.