BANGS: Game-theoretic Node Selection for Graph Self-Training

Authors: Fangxin Wang, Kay Liu, Sourav Medya, Philip Yu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments. Experimental results validate the effectiveness of BANGS across various datasets and base models. By theoretically linking random walk and feature propagation, we enhance the scalability of our approach. Additionally, we demonstrate the effectiveness of BANGS under noisy labels and varying portion of training data.
Researcher Affiliation Academia Fangxin Wang, Kay Liu, Sourav Medya, Philip S. Yu Department of Computer Science University of Illinois Chicago fwang51, zliu234, medya, EMAIL
Pseudocode Yes C ALGORITHM FORMULATION In this section, we provide pseudo-code in Algorithm 1 and pipeline figure in Figure 3 for our method, BANGS.
Open Source Code Yes The codebase is available on https://github.com/fangxin-wang/BANGS.
Open Datasets Yes We test baseline methods and our method on five graph datasets: for Cora, Citeseer, and Pub Med (Yang et al., 2016), we follow their official split; as for Last FM (Rozemberczki & Sarkar, 2020), Flickr (Zeng et al., 2019), we split them in a similar portion that training, validation, and test data take 5%, 15%, and 80%, respectively. The datasets could be found in: Cora, Citeseer and Pub Med (Yang et al., 2016) (https://github.com/ kimiyoung/planetoid); Last FM (Rozemberczki & Sarkar, 2020) (https://github.com/ benedekrozemberczki/FEATHER). Flickr (Zeng et al., 2019) (https://github.com/Graph SAINT/Graph SAINT). We employ the re-packaged datasets from Py G (Fey & Lenssen, 2019) (https://github.com/ pyg-team/pytorch_geometric, version 2.5.2).
Dataset Splits Yes for Cora, Citeseer, and Pub Med (Yang et al., 2016), we follow their official split; as for Last FM (Rozemberczki & Sarkar, 2020), Flickr (Zeng et al., 2019), we split them in a similar portion that training, validation, and test data take 5%, 15%, and 80%, respectively.
Hardware Specification Yes The experiments are mainly running in a machine with NVIDIA Ge Force GTX 4090 Ti GPU with 24 GB memory, and 80 GB main memory. Some experiments of small graphs are conducted on a Mac Book Pro with Apple M1 Pro Chip with 16 GB memory.
Software Dependencies Yes We employ the re-packaged datasets from Py G (Fey & Lenssen, 2019) (https://github.com/ pyg-team/pytorch_geometric, version 2.5.2).
Experiment Setup Yes The base model is set to Graph Convolutional Network (GCN) (Kipf & Welling, 2016) by default, while we also include results for other GNN models. ... For a fair comparison, we select the suggested hyperparameters for all baseline methods, especially in the node selection criterion. For instance, we use the suggested confidence threshold by Ca GCN, e.g., 0.8 for Cora and 0.9 for Citeseer. We set the max iteration number as 40, and use validation data to early stop. For node selection, we sample 500 times for calculating Banzhaf values. The two varying hyperparameters are the number of candidate nodes K and selected nodes k in each iteration. The value of k is set as 100 for small-scale graphs, i.e., Cora, Citeseer and Pub Med, and 400 for other larger graphs; K = k + 100.