Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

Authors: Shuyuan Zhao, Wei Chen, Boyan Shi, Liyong Zhou, Shuohao Lin, Huaiyu Wan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three real-world datasets show that STKDRec significantly outperforms the state-of-the-art baselines.
Researcher Affiliation Academia Shuyuan Zhao1,2*, Wei Chen1,2*, Boyan Shi1,2, Liyong Zhou1,2, Shuohao Lin1,2, Huaiyu Wan1,2 1School of Computer Science & Technology, Beijing Jiaotong University, Beijing, China 2Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations, but it does not include a distinct section or figure explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code https://github.com/Zhaoshuyuan0246/STKDRec
Open Datasets No We select three publicly available city takeaway recommendation datasets for evaluation: Wuhan, Sanya, and Taiyuan, which are provided by the well-known takeaway platform Ele.me. Each dataset contains users purchase sequences and associated attributes. The paper describes using publicly available datasets but does not provide specific links, DOIs, or formal citations for external access.
Dataset Splits Yes Finally, for each user purchase sequence, the last purchased takeaway is designated as test data, the second-to-last as validation data, and the rest as training data.
Hardware Specification Yes The implementation is carried out in Py Torch on a single NVIDIA A40 GPU with 48GB of memory.
Software Dependencies No All evaluation methods are implemented in Py Torch. However, no specific version number for Py Torch or any other software dependency is provided.
Experiment Setup Yes For STKDRec, we conduct experiments with the following hyper-parameters. For the STKG encoder, the sampling depth m is set to 2, and the number of neighbor nodes sampled s is selected from {[5,5], [10,10], [15,15], [20,20]}. For the ST-Transformer module, we set the number of self-attention blocks and attention heads to 2 and the embedding dimension to 256. For STKD, the temperature τ is selected from {1, 3, 5, 7, 9}, and the α is fixed at 0.2. We use Adam as the optimizer, with the learning rate, β1, and β2 set to 0.001, 0.9, and 0.98, respectively. The batch size and maximum sequence length n are both set to 128. An early stopping strategy is applied based on the performance of the validation data.