reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning

Authors: Yuankai Luo, Hongkang Li, Qijiong Liu, Lei Shi, Xiao-Ming Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 34 datasets, encompassing node classification, graph classification, link prediction, and attributed graph clustering tasks, demonstrate that the generated node IDs significantly enhance speed and memory efficiency while achieving competitive performance compared to current state-of-the-art methods.
Researcher Affiliation	Academia	All listed affiliations (Beihang University, The Hong Kong Polytechnic University, Rensselaer Polytechnic Institute) are academic institutions. The provided email domains (buaa.edu.cn, polyu.edu.hk) are also associated with academic institutions.
Pseudocode	No	The paper describes the framework and methods using mathematical equations and textual explanations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/LUOyk1999/Node ID.
Open Datasets	Yes	Table 11 presents a summary of the statistics and characteristics of the datasets. The initial eight datasets are sourced from TUDataset (Morris et al., 2020), followed by two from LRGB (Dwivedi et al., 2022), and finally the remaining datasets are obtained from Hu et al. (2020); Kipf & Welling (2017); Chien et al. (2020); Pei et al. (2019); Rozemberczki et al. (2021); Mc Auley et al. (2015); Leskovec & Krevl (2016); Mernyei & Cangea (2020); Lim et al. (2021); Platonov et al. (2023).
Dataset Splits	Yes	For Cora, Citeseer, and Pubmed, we employ a training/validation/testing split ratio of 60%/20%/20% and use accuracy as the evaluation metric, consistent with Pei et al. (2019). For Squirrel, Chameleon, Amazon-ratings and Questions, we adhere to the standard splits and evaluation metrics outlined in Platonov et al. (2023). For the remaining datasets, standard splits and metrics are followed as specified in Luo et al. (2024b). The scaffold-split (Ramsundar et al., 2019) is used to split downstream dataset graphs into training/validation/testing set as 80%/10%/10% which mimics real-world use cases.
Hardware Specification	Yes	The experiments are conducted on a single workstation with 8 RTX 3090 GPUs.
Software Dependencies	No	The paper states: "Our implementation is based on Py G (Fey & Lenssen, 2019) and DGL (Wang et al., 2019b)." It mentions software such as Py G, DGL, Adam optimizer, and LIBSVM, but it does not provide specific version numbers for these software components, which are crucial for reproducibility.
Experiment Setup	Yes	Table 12: Task-specific hyperparameter settings of NID framework. The table provides values for Codebook size K, MPNNs layer L, Hidden dim, LR, epoch, and MLP layer for various datasets and tasks. Additionally, the text states: "The β is set to 1." and "We utilize the Adam optimizer (Kingma & Ba, 2014) with the default settings. We set a learning rate of either 0.01 or 0.001 and an epoch limit of 1000. The Re LU function serves as the non-linear activation."