Long-Sequence Recommendation Models Need Decoupled Embeddings

Authors: Ningya Feng, Junwei Pan, Jialong Wu, Baixu Chen, Ximei Wang, QianLi, Xian Hu, Jie Jiang, Mingsheng Long

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and analysis demonstrate that DARE provides more accurate searches of correlated behaviors and outperforms baselines with AUC gains up to 9 on public datasets and notable improvements on Tencent s advertising platform.
Researcher Affiliation Collaboration Ningya Feng1 , Junwei Pan2 , Jialong Wu1 , Baixu Chen1, Ximei Wang2, Qian Li2, Xian Hu2, Jie Jiang2, Mingsheng Long1B 1School of Software, BNRist, Tsinghua University, China 2Tencent Inc, China
Pseudocode No The paper describes methods mathematically and in prose, but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code in Py Torch for experiments, including model analysis, is available at https://github.com/thuml/DARE.
Open Datasets Yes We use the publicly available Taobao (Zhu et al., 2018; 2019; Zhuo et al., 2020) and Tmall (Tianchi, 2018) datasets, which provide users behavior data over specific time periods on their platforms. ... Tianchi. Ijcai-15 repeat buyers prediction dataset, 2018. URL https://tianchi.aliyun. com/dataset/data Detail?data Id=42.
Dataset Splits Yes Training-validation-test split. We sequentially number history behaviors from one (the most recent behavior) to T (the most ancient behavior) according to the time step. The test dataset contains predictions of the first behaviors, while the second behaviors are used as the validation dataset. For the training dataset, we use the p3 5i, 0 ď i ď 18qth behavior. Models would finish predicting the jth behavior based on j 200 to j 1 behaviors (padding if history length is not long enough). Only users with behavior sequences longer than 210 will be reserved.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running its experiments.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes A.1 HYPER-PARAMETERS AND MODEL DETAILS The hyper-parameters we use are listed as follows: Parameter Value Retrieve number 20 Epoch 2 Batch size 2048 Learning rate 0.01 Weight decay 1e-6 Besides, we use the Adam optimizer. Layers of the Multi-layer Perceptron (MLP) are set as 200 ˆ 80 ˆ 2, which is the same as Zhou et al. (2024). These settings remain the same in all our experiments.