Noise-robust Graph Learning by Estimating and Leveraging Pairwise Interactions

Authors: Xuefeng Du, Tian Bian, Yu Rong, Bo Han, Tongliang Liu, Tingyang Xu, Wenbing Huang, Yixuan Li, Junzhou Huang

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on different datasets and GNN architectures demonstrate the effectiveness of PI-GNN, yielding a promising improvement over the state-of-the-art methods. Code is publicly available at https://github.com/Tian Bian95/pi-gnn. ... In this section, we present empirical evidence to validate the effectiveness of PI-GNN on different datasets with different noise types and ratios. ... Table 1: Test accuracy on 5 datasets for PI-GNN with GCN as the backbone. ... Figure 3: Test accuracy of PI-GNN and comparison with PI-GNN w/o pc and vanilla GNN on two additional model architectures under different noisy settings.
Researcher Affiliation Collaboration Xuefeng Du EMAIL University of Wisconsin-Madison Tian Bian EMAIL The Chinese University of Hong Kong Yu Rong EMAIL Tencent AI Lab Bo Han EMAIL Hong Kong Baptist University Tongliang Liu EMAIL Mohamed bin Zayed University of Artificial Intelligence The University of Sydney Tingyang Xu EMAIL Tencent AI Lab Wenbing Huang EMAIL Renmin University of China Yixuan Li EMAIL University of Wisconsin-Madison Junzhou Huang EMAIL University of Texas at Arlington
Pseudocode Yes Algorithm 1 PI-GNN: Noise-robust Graph Learning by Estimating and Leveraging Pairwise Interactions Input: Input graph G = (V, A, X) with noisy training data D tr = {(A, Xv, yv)}v V , randomly initialized GNNs fe and ft with parameter θe and θt, weight for regularization loss β, pretraining epoch K for fe. Total training epoch N. Output: Robust GNN ft. for epoch = 0; epoch < N; epoch + + do if epoch K then Update the parameter θe of the PI label estimation model fe by Equation 3. Set β = 0 in Equation 5, update the parameter θt of the node classification model ft. else Update the parameter θe of the PI label estimation model fe by Equation 3. Estimate the PI label y PI by Equation 4 with fe. Update the parameter θt of the node classification model ft by Equation 5. end end return The node classification model ft.
Open Source Code Yes Code is publicly available at https://github.com/Tian Bian95/pi-gnn.
Open Datasets Yes We used five datasets to evaluate PI-GNN, including Cora, Cite Seer and Pub Med with the default dataset split as in (Kipf & Welling, 2017) and Wiki CS dataset (Mernyei & Cangea, 2020) as well as OGB-arxiv dataset (Hu et al., 2020).
Dataset Splits Yes We used five datasets to evaluate PI-GNN, including Cora, Cite Seer and Pub Med with the default dataset split as in (Kipf & Welling, 2017) and Wiki CS dataset (Mernyei & Cangea, 2020) as well as OGB-arxiv dataset (Hu et al., 2020). For Wiki CS, we used the first 20 nodes from each class for training and the next 20 nodes for validation. The remaining nodes for each class are used as the test set. For OGB-arxiv, we use the default split.
Hardware Specification Yes We trained for 400 epochs on a Tesla P40.
Software Dependencies Yes We run all experiments with Python 3.8.5 and Py Torch 1.7.0, using NVIDIA TESLA P40 GPUs. ... We used three different GNN architectures, i.e., GCN, GAT and Graph SAGE, which are implemented by torch-geometric 2 (Fey & Lenssen, 2019).
Experiment Setup Yes Specifically, the hidden dimension of GCN, GAT and Graph SAGE is set to 16, 8 and 64. GAT has 8 attention heads in the first layer and 1 head in the second layer. The mean aggregator is used for Graph SAGE. We applied Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.01 for GCN and Graph SAGE and 0.005 for GAT. The weight decay is set to 5e-4. We trained for 400 epochs on a Tesla P40. The loss weight β is set to |V |2/(|V |2 Q)2, where |V | is the number of nodes and Q is the sum of all elements of the preprocessed adjacency matrix. The number of pretraining epochs K is set to 50 and the total epoch N is set to 400. For subgraph sampling, we sampled 15 and 10 neighbors for each node in the 1st and 2nd layer of the GNN and set the batch size to be 1024.