reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-label Node Classification On Graph-Structured Data

Authors: Tianqi Zhao, Thi Ngan Dong, Alan Hanjalic, Megha Khosla

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario.
Researcher Affiliation	Academia	Tianqi Zhao EMAIL Department of Intelligent Systems Delft University of Technology Ngan Thi Dong EMAIL L3S Research Center, Hannover, Germany Alan Hanjalic EMAIL Delft University of Technology Megha Khosla EMAIL Delft University of Technology
Pseudocode	No	The paper describes methodologies and processes but does not include any clearly labeled pseudocode or algorithm blocks. It provides mathematical definitions and descriptions in prose.
Open Source Code	Yes	We release our benchmark at https://github.com/Tianqi-py/MLGNC. Our code is available at https://github.com/Tianqi-py/MLGNC.
Open Datasets	Yes	The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. In particular, we curate 3 biological graph datasets using publicly available data. The detailed pre-processing steps and the original data sources are discussed in Appendix A.1.1, A.1.2, and A.1.3.
Dataset Splits	Yes	For all datasets except OGB-Proteins, Hum Loc, and Euk Loc we generate 3 random training, validation, and test splits with 60%, 20%, and 20% of the data. For OGBProteins, Hum Loc, and Euk Loc we follow the predefined data splits from (Hu et al., 2020), (Shen & Chou, 2007) and (Chou & Shen, 2007) respectively.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments. It only details models, hyperparameters, and datasets.
Software Dependencies	No	The paper provides hyperparameter settings in Appendix A.3 but does not explicitly list software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The training and hyperparameter settings for each model are summarized in the Appendix A.3 in tables 7 and 8. Table 7: The hyperparameter setting for Mlp and GNN baselines in this work for all datasets Table 8: The hyperparameter setting for Deep Walk in this work for all datasets