LEGO-Learn: Label-Efficient Graph Open-Set Learning
Authors: Haoyan Xu, Kay Liu, Zhengtao Yao, Philip S. Yu, Mengyuan Li, Kaize Ding, Yue Zhao
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on four real-world datasets demonstrate that LEGO-Learn significantly outperforms leading methods, achieving up to a 6.62% improvement in ID classification accuracy and a 7.49% increase in AUROC for OOD detection. |
| Researcher Affiliation | Academia | Haoyan Xu EMAIL University of Southern California; Kay Liu EMAIL University of Illinois Chicago; Zhengtao Yao EMAIL University of Southern California; Philip S. Yu EMAIL University of Illinois Chicago; Mengyuan Li EMAIL University of Southern California; Kaize Ding EMAIL Northwestern University; Yue Zhao EMAIL University of Southern California |
| Pseudocode | Yes | Algorithm 1 The LEGO-Learn algorithm |
| Open Source Code | Yes | The code is available at: https://github.com/zhengtaoyao/lego. |
| Open Datasets | Yes | We test LEGO-Learn on four real-world datasets (Sen et al., 2008; Shchur et al., 2018; Mc Auley et al., 2015) that are widely used as benchmarks for node classification, i.e., Cora, Amazon Computers, Amazon Photo and Last FMAsia. |
| Dataset Splits | Yes | For each dataset with C ID classes, we randomly select 10 C of ID nodes and the same number of OOD nodes as the validation set. We then randomly select 500 ID nodes and 500 OOD nodes as the test set. All remaining nodes constitute the "unlabeled node pool". |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using GNNs, GCNs, and OODGAT layers but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version). |
| Experiment Setup | Yes | All GCNs have 2 layers with hidden dimensions of 32. The weight for the unknown class in the filter s loss function is chosen from {0.001, 0.1, 0.2} based on the results of the validation set. All models use a learning rate of 0.01 and a dropout probability of 0.5. The initial label budget for all datasets is 5 nodes per ID class. The total label budget is 15 nodes per ID class. For each round of selection, we select 2 C of nodes from the unlabeled pool and annotate the selected nodes for all methods. For all K-Medoids based selection methods, the number of clusters is set to 48. In each round of node selection, we select 2 C of nodes with the highest uncertainty of prediction from the 48 medoids. |