Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning

Authors: Yuankai Luo, Hongkang Li, Qijiong Liu, Lei Shi, Xiao-Ming Wu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 34 datasets, encompassing node classification, graph classification, link prediction, and attributed graph clustering tasks, demonstrate that the generated node IDs significantly enhance speed and memory efficiency while achieving competitive performance compared to current state-of-the-art methods.
Researcher Affiliation Academia All listed affiliations (Beihang University, The Hong Kong Polytechnic University, Rensselaer Polytechnic Institute) are academic institutions. The provided email domains (buaa.edu.cn, polyu.edu.hk) are also associated with academic institutions.
Pseudocode No The paper describes the framework and methods using mathematical equations and textual explanations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our source code is available at https://github.com/LUOyk1999/Node ID.
Open Datasets Yes Table 11 presents a summary of the statistics and characteristics of the datasets. The initial eight datasets are sourced from TUDataset (Morris et al., 2020), followed by two from LRGB (Dwivedi et al., 2022), and finally the remaining datasets are obtained from Hu et al. (2020); Kipf & Welling (2017); Chien et al. (2020); Pei et al. (2019); Rozemberczki et al. (2021); Mc Auley et al. (2015); Leskovec & Krevl (2016); Mernyei & Cangea (2020); Lim et al. (2021); Platonov et al. (2023).
Dataset Splits Yes For Cora, Citeseer, and Pubmed, we employ a training/validation/testing split ratio of 60%/20%/20% and use accuracy as the evaluation metric, consistent with Pei et al. (2019). For Squirrel, Chameleon, Amazon-ratings and Questions, we adhere to the standard splits and evaluation metrics outlined in Platonov et al. (2023). For the remaining datasets, standard splits and metrics are followed as specified in Luo et al. (2024b). The scaffold-split (Ramsundar et al., 2019) is used to split downstream dataset graphs into training/validation/testing set as 80%/10%/10% which mimics real-world use cases.
Hardware Specification Yes The experiments are conducted on a single workstation with 8 RTX 3090 GPUs.
Software Dependencies No The paper states: "Our implementation is based on Py G (Fey & Lenssen, 2019) and DGL (Wang et al., 2019b)." It mentions software such as Py G, DGL, Adam optimizer, and LIBSVM, but it does not provide specific version numbers for these software components, which are crucial for reproducibility.
Experiment Setup Yes Table 12: Task-specific hyperparameter settings of NID framework. The table provides values for Codebook size K, MPNNs layer L, Hidden dim, LR, epoch, and MLP layer for various datasets and tasks. Additionally, the text states: "The β is set to 1." and "We utilize the Adam optimizer (Kingma & Ba, 2014) with the default settings. We set a learning rate of either 0.01 or 0.001 and an epoch limit of 1000. The Re LU function serves as the non-linear activation."