reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

When Do LLMs Help With Node Classification? A Comprehensive Analysis

Authors: Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, Hong Cheng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Subsequently, we conducted extensive experiments, training and evaluating over 2,700 models, to determine the key settings (e.g., learning paradigms and homophily) and components (e.g., model size and prompt) that affect performance.
Researcher Affiliation	Collaboration	1Department of Systems Engineering and Engineering Management, and Shun Hing Institute of Advanced Engineering, The Chinese University of Hong Kong 2Microsoft Research Asia 3University of Illinois Urbana-Champaign.
Pseudocode	No	The paper describes methods and approaches in prose, but there are no structured blocks explicitly labeled as pseudocode or algorithm.
Open Source Code	Yes	Codes and datasets are released at https://llmnodebed.github.io/.
Open Datasets	Yes	Codes and datasets are released at https://llmnodebed.github.io/. ... The processed data is publicly available at https://huggingface.co/datasets/xxwu/LLMNode Bed. ... Cora and Pubmed (He et al., 2024), Citeseer (Chen et al., 2024b), and Wiki CS (Liu et al., 2024). The remaining datasets already include text attributes in their official releases, including ar Xiv (Hu et al., 2020), Instagram and Reddit (Huang et al., 2024), Books, Computer, and Photo (Yan et al., 2023), Cornell, Texas, Wisconsin, and Washington (Wang et al., 2025).
Dataset Splits	Yes	For experimental datasets, we adopt the official splits designed for semi-supervised settings to ensure standardized evaluation. ... Specifically, we use a 60% training, 20% validation, and 20% testing split for most datasets. ... Detailed data splits are provided in Table 10 in the Appendix. ... For heterophilic graphs... For dataset splits, we assign semi-supervised and supervised settings with 1:1:8 and 6:2:2 splits for training, validation, and test sets, respectively.
Hardware Specification	Yes	All measurements were conducted on a single NVIDIA H100-80G GPU to ensure consistency. ... All recorded experiment times are based on a single NVIDIA H100-80G GPU. ... GPU Device: 1 NVIDIA A6000-48G ... 2 NVIDIA A6000-48G
Software Dependencies	No	We release LLMNode Bed, a Py G-based testbed designed to facilitate reproducible and rigorous research in LLM-based node classification algorithms. ... Open-source models can be easily loaded via the Transformers library.
Experiment Setup	Yes	For GNNs with arbitrary input embeddings... we perform a grid-search on the hyperparameters as follows: num layers in [2, 3, 4], hidden dimension in [32, 64, 128, 256], and dropout in [0.3, 0.5, 0.7]. ... For both GNNs and MLPs across experimental datasets, the learning rate is consistently set to 1e 2, following previous studies... The total number of epochs is set to 500 with a patience of 100. ... For Sen BERT-66M and Ro BERTa-355M, we set the training epochs to 10 for semi-supervised settings and 4 for supervised settings. The batch size is set to 32, and the learning rate is set to 2e 5.