BANGS: Game-theoretic Node Selection for Graph Self-Training
Authors: Fangxin Wang, Kay Liu, Sourav Medya, Philip Yu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. Experimental results validate the effectiveness of BANGS across various datasets and base models. By theoretically linking random walk and feature propagation, we enhance the scalability of our approach. Additionally, we demonstrate the effectiveness of BANGS under noisy labels and varying portion of training data. |
| Researcher Affiliation | Academia | Fangxin Wang, Kay Liu, Sourav Medya, Philip S. Yu Department of Computer Science University of Illinois Chicago fwang51, zliu234, medya, EMAIL |
| Pseudocode | Yes | C ALGORITHM FORMULATION In this section, we provide pseudo-code in Algorithm 1 and pipeline figure in Figure 3 for our method, BANGS. |
| Open Source Code | Yes | The codebase is available on https://github.com/fangxin-wang/BANGS. |
| Open Datasets | Yes | We test baseline methods and our method on five graph datasets: for Cora, Citeseer, and Pub Med (Yang et al., 2016), we follow their official split; as for Last FM (Rozemberczki & Sarkar, 2020), Flickr (Zeng et al., 2019), we split them in a similar portion that training, validation, and test data take 5%, 15%, and 80%, respectively. The datasets could be found in: Cora, Citeseer and Pub Med (Yang et al., 2016) (https://github.com/ kimiyoung/planetoid); Last FM (Rozemberczki & Sarkar, 2020) (https://github.com/ benedekrozemberczki/FEATHER). Flickr (Zeng et al., 2019) (https://github.com/Graph SAINT/Graph SAINT). We employ the re-packaged datasets from Py G (Fey & Lenssen, 2019) (https://github.com/ pyg-team/pytorch_geometric, version 2.5.2). |
| Dataset Splits | Yes | for Cora, Citeseer, and Pub Med (Yang et al., 2016), we follow their official split; as for Last FM (Rozemberczki & Sarkar, 2020), Flickr (Zeng et al., 2019), we split them in a similar portion that training, validation, and test data take 5%, 15%, and 80%, respectively. |
| Hardware Specification | Yes | The experiments are mainly running in a machine with NVIDIA Ge Force GTX 4090 Ti GPU with 24 GB memory, and 80 GB main memory. Some experiments of small graphs are conducted on a Mac Book Pro with Apple M1 Pro Chip with 16 GB memory. |
| Software Dependencies | Yes | We employ the re-packaged datasets from Py G (Fey & Lenssen, 2019) (https://github.com/ pyg-team/pytorch_geometric, version 2.5.2). |
| Experiment Setup | Yes | The base model is set to Graph Convolutional Network (GCN) (Kipf & Welling, 2016) by default, while we also include results for other GNN models. ... For a fair comparison, we select the suggested hyperparameters for all baseline methods, especially in the node selection criterion. For instance, we use the suggested confidence threshold by Ca GCN, e.g., 0.8 for Cora and 0.9 for Citeseer. We set the max iteration number as 40, and use validation data to early stop. For node selection, we sample 500 times for calculating Banzhaf values. The two varying hyperparameters are the number of candidate nodes K and selected nodes k in each iteration. The value of k is set as 100 for small-scale graphs, i.e., Cora, Citeseer and Pub Med, and 400 for other larger graphs; K = k + 100. |