LEGO-Learn: Label-Efficient Graph Open-Set Learning

Authors: Haoyan Xu, Kay Liu, Zhengtao Yao, Philip S. Yu, Mengyuan Li, Kaize Ding, Yue Zhao

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on four real-world datasets demonstrate that LEGO-Learn significantly outperforms leading methods, achieving up to a 6.62% improvement in ID classification accuracy and a 7.49% increase in AUROC for OOD detection.
Researcher Affiliation Academia Haoyan Xu EMAIL University of Southern California; Kay Liu EMAIL University of Illinois Chicago; Zhengtao Yao EMAIL University of Southern California; Philip S. Yu EMAIL University of Illinois Chicago; Mengyuan Li EMAIL University of Southern California; Kaize Ding EMAIL Northwestern University; Yue Zhao EMAIL University of Southern California
Pseudocode Yes Algorithm 1 The LEGO-Learn algorithm
Open Source Code Yes The code is available at: https://github.com/zhengtaoyao/lego.
Open Datasets Yes We test LEGO-Learn on four real-world datasets (Sen et al., 2008; Shchur et al., 2018; Mc Auley et al., 2015) that are widely used as benchmarks for node classification, i.e., Cora, Amazon Computers, Amazon Photo and Last FMAsia.
Dataset Splits Yes For each dataset with C ID classes, we randomly select 10 C of ID nodes and the same number of OOD nodes as the validation set. We then randomly select 500 ID nodes and 500 OOD nodes as the test set. All remaining nodes constitute the "unlabeled node pool".
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies No The paper mentions using GNNs, GCNs, and OODGAT layers but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version).
Experiment Setup Yes All GCNs have 2 layers with hidden dimensions of 32. The weight for the unknown class in the filter s loss function is chosen from {0.001, 0.1, 0.2} based on the results of the validation set. All models use a learning rate of 0.01 and a dropout probability of 0.5. The initial label budget for all datasets is 5 nodes per ID class. The total label budget is 15 nodes per ID class. For each round of selection, we select 2 C of nodes from the unlabeled pool and annotate the selected nodes for all methods. For all K-Medoids based selection methods, the number of clusters is set to 48. In each round of node selection, we select 2 C of nodes with the highest uncertainty of prediction from the 48 medoids.