Test-Time Adaptation on Recommender System with Data-Centric Graph Transformation
Authors: Yating Liu, Xin Zheng, Yi Li, Yanqing Guo
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate TTA-GREC s superiority at test time and provide new data-centric insights on test-time adaptation for better recommender system inference. Extensive experiments on multiple public datasets demonstrate that TTA-GREC significantly outperforms existing methods on key metrics such as Recall and NDCG (e.g., 4.46% Recall and 1.86 NDCG improvement in Last-FM). To evaluate the effectiveness of the proposed TTA-GREC, we compare its performance with baseline methods across multiple datasets, as shown in Table 1. To evaluate the contribution of each submodule in TTA-GREC, we conduct ablation studies by sequentially removing: (I) w/o UI transformation: Removes UI graph transformation with only the original test UI list. (II) w/o KG revision: Removes KG transformation by only origanal KG embedding. (III) w/o CL: Removes sampling-based contrastive learning by the Euclidean distance of the embedding. |
| Researcher Affiliation | Academia | Yating Liu1 , Xin Zheng2 , Yi Li1 , Yanqing Guo1 1Dalian University of Technology, China 2Griffith University, Australia EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, a link to a code repository, or mention of code being available in supplementary materials. The conclusion mentions future work on 'more efficient TTA strategies' which suggests no immediate public release of code. |
| Open Datasets | Yes | We utilize three different datasets: Last-FM, MIND, and Alibaba-i Fashion, which respectively represent different domains of recommendation systems. Last-FM [Wang et al., 2019; Zhao et al., 2019]: It is a dataset of user-music interaction logs with rich metadata. MIND [Tian et al., 2021]: It is a news recommendation dataset with complex user-item interactions and semantic content. Alibaba-i Fashion [Wang et al., 2021]: It is a dataset focused on fashion product recommendations, featuring dynamic user preferences and detailed item attributes. We follow the procedures and partitions in previous works [Wang et al., 2019; Tian et al., 2021; Wang et al., 2021; Yang et al., 2023]. |
| Dataset Splits | Yes | We follow the procedures and partitions in previous works [Wang et al., 2019; Tian et al., 2021; Wang et al., 2021; Yang et al., 2023]. For each KGNN model, we follow a standard training pipeline and train it on the training set until it achieves the best performance on the validation set in terms of recommendation. |
| Hardware Specification | Yes | Table 3: Runtime efficiency comparison (Evaluated on NVIDIA RTX 4090 GPU). |
| Software Dependencies | No | The paper mentions models and components like 'GCN', 'KGNNθ model', 'MLP', but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, nor the programming language version used for implementation. |
| Experiment Setup | Yes | We evaluate performance using Recall@N and NDCG@N, with N = 20, to assess the model s capability in generating top-N recommendations effectively. Hyper-parameter Sensitivity Analysis. The results in Figures 3 and 4 highlight the impact of mask size and temperature parameter on Recall and NDCG. Figure 3 shows the effect of different mask sizes on Recall and NDCG. The main observations are as follows: the best performance is achieved with a mask size of 128. Figure 4 shows the effect of different values of τ on Recall and NDCG. We observe that both Recall and NDCG reach their highest values when τ = 0.1. |