Iterative Sparse Attention for Long-sequence Recommendation

Authors: Guanyu Lin, Jinwei Luo, Yinfeng Li, Chen Gao, Qun Luo, Depeng Jin

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two real-world datasets show the superiority of our proposed method against state-of-the-art baselines. In this section, we experiment on two real-world datasets and explore the answers to the research questions (RQs): RQ1: How does the proposed ISA outperform state-of-the-art sequential recommendation models? RQ2: What is the impact of our sparse attention components? Are the high-level components, Sparse Attention Layer and Iterative Attention Layer, effective? RQ3: Does the proposed ISA still outperform state-of-the-art sequential models when varying sequence lengths?
Researcher Affiliation Collaboration Guanyu Lin1,2, Jinwei Luo3, Yinfeng Li1 Chen Gao*1 Qun Luo4 Depeng Jin1 1BNRist, Tsinghua University 2Carnegie Mellon University 3Shenzhen University 4Tencent Inc.
Pseudocode No The paper describes the methodology using textual explanations and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/tsinghua-fib-lab/ISA
Open Datasets No As for datasets, we conduct evaluations of recommendation performance on two large-scale datasets. The data statistics after 10-core filtering are as Table 2, where mean length is the average of sequence length for users. Table 2: Data statistics after 10-core setting filtering. Dataset: Taobao, Short Video
Dataset Splits No Input: Click item sequence Iu = (i1, i2, . . . , it), t 100 for user u. Output: Click probability of user u to target item it+1. As for datasets, we conduct evaluations of recommendation performance on two large-scale datasets. The data statistics after 10-core filtering are as Table 2. These describe the task setup and dataset characteristics, but not specific train/validation/test splits for the experiments.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU/CPU models, memory, or cloud computing specifications.
Software Dependencies No The paper does not provide specific details on software dependencies, such as library names with version numbers, used in the experiments.
Experiment Setup No The paper mentions hyperparameters like teleport probability α for Personalized Page Rank, window size w for window attention, and the number of random items r for random attention, as well as λ for regularization. However, it does not provide the specific values used for these hyperparameters or other system-level training configurations like learning rate, batch size, or optimizer settings.