Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Partition-Based Active Learning for Graph Neural Networks

Authors: Jiaqi Ma, Ziqiao Ma, Joyce Chai, Qiaozhu Mei

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple benchmark datasets demonstrate that the proposed method outperforms existing active learning methods for GNNs under a wide range of annotation budget constraints. In addition, the proposed method does not introduce additional hyperparameters, which is crucial for model training, especially in the active learning setting where a labeled validation set may not be available. 5 Experiments
Researcher Affiliation Academia Jiaqi Ma EMAIL School of Information Sciences University of Illinois Urbana-Champaign; Ziqiao Ma EMAIL Department of Computer Science and Engineering University of Michigan; Joyce Chai EMAIL Department of Computer Science and Engineering University of Michigan; Qiaozhu Mei EMAIL School of Information Department of Computer Science and Engineering University of Michigan
Pseudocode Yes Algorithm 1 Graph-Partition-Based Query Input: A K-partition TK of the graph, budget b Output: A subset of unlabelled nodes s1 of size b: s1 V \s0 and |s1| = b 1: Set s1 = . 2: for Tk TK do 3: bk b//K. 4: Tk Tk \ {s0 s1}. 5: Ek {g(vi)}i Tk. 6: s bk-Medoids(Ek). //Perform K-Medoids clustering on the set of data points Ek with bk medoids returned as s. 7: s1 = s1 s. 8: end for 9: return s1
Open Source Code No The paper does not provide a specific link to the source code for the methodology described, nor does it explicitly state that the code is being released or is available in supplementary materials. It only mentions a third-party toolkit ('kneebow 2 toolkit') but not their own implementation code.
Open Datasets Yes Dataset. We experiment on citation networks Citeseer, Cora, and Pubmed (Sen et al., 2008), three standard node classification benchmarks. We also experiment on Corafull (Bojchevski & Günnemann, 2018) and Ogbn-Arxiv (Hu et al., 2020b), for performance on denser networks with more classes, and on co-authorship networks (Shchur et al., 2018) for diversity.
Dataset Splits No The paper mentions that for the active learning setup, labeled samples are not enough for a validation set, and they evaluate over the full graph. It discusses 'label budgets' for selecting nodes for annotation but does not provide explicit, fixed training, validation, and test dataset splits with percentages, counts, or references to predefined splits for reproducibility of the overall dataset partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general GNN models and optimizers used.
Software Dependencies No The paper mentions 'Py Torch Geometric (Fey & Lenssen, 2019)' for dataset preprocessing and model construction and 'kneebow 2 toolkit' for determining the number of communities. However, it does not specify version numbers for PyTorch Geometric or the 'kneebow' toolkit, which are necessary for reproducible software dependencies.
Experiment Setup Yes To train each model, we use an Adam optimizer with an initial learning rate of 1e-2 and weight decay of 5e-4. As in the active learning setup, there should not enough labeled samples to be used as a validation set, we train the GNN model with fixed 300 epochs in all the experiments and evaluate over the full graph.