Layer-diverse Negative Sampling for Graph Neural Networks
Authors: Wei Duan, Jie Lu, Yu Guang Wang, Junyu Xuan
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various real-world graph datasets demonstrate the effectiveness of our approach in improving the diversity of negative samples and overall learning performance. Our experiments aimed to address these questions: (1) Can the addition of negative samples obtained using our method improve the performance of GNNs compared to baseline methods? (Section 4.1) (2) Does our method result in negative samples with reduced redundancy? (Section 4.2) (3) Does our method yield consistent results even when fewer nodes are included in the negative sampling? (Section 4.2) (4) How would our negative sampling approach perform when applied to other GNNs architectures? (Section 4.3) (5) Does incorporating these negative samples into graph convolution alleviate issues with over-smoothing and over-squashing? (Section 4.4) (6) What is the time complexity of the proposed method? (Section 4.5) (7) How do the sampling results of our LDGCN model compare to those of the D2GCN model in terms of overlap reduction and sample diversity? (Section 4.6) |
| Researcher Affiliation | Academia | Wei Duan EMAIL Australian Artificial Intelligence Institute University of Technology Sydney Jie Lu EMAIL Australian Artificial Intelligence Institute University of Technology Sydney Yu Guang Wang EMAIL Institute of Natural Sciences School of Mathematical Sciences Shanghai Jiao Tong University Junyu Xuan EMAIL Australian Artificial Intelligence Institute University of Technology Sydney |
| Pseudocode | Yes | Algorithm 1: Get candidate set Si using shortest-path-based method Input : A graph G, sample length P, node i 1 Compute the shortest path lengths from i to all reachable nodes Vr; 2 Divide Vr into different sets Vp based on the path length p; 4 for p 2 to P do 5 Randomly choose a node j in Vp; 6 Collect first-order neighbours Nj of j; 7 Si Si Nj j; 8 end Output: Candidate set Si |
| Open Source Code | No | The paper mentions that the AERO model is implemented by code published on GitHub and that GCN, SAGE, GATv2, and GIN-ϵ use Py Torch Geometric to implement their convolutional layers. However, there is no explicit statement or link indicating that the authors' own code for the proposed LDGCN method is open-source or publicly available. |
| Open Datasets | Yes | We first conducted our experiments with seven homophilous datasets for semi-supervised node classification, including citation network: Citeseer, Cora and Pub Med (Sen et al., 2008), Coauthor networks: CS (Shchur et al., 2018), Amazon networks: Computers and Photo (Shchur et al., 2018), and Open Graph Benchmark: ogbn-arxiv (Hu et al., 2020). Then, we expanded our experiments to three heterophilous datasets, including Cornell, Texas, and Wisconsin Craven et al. (1998). The first six datasets are downloaded from Py Torch Geometric (Py G)2. The Ogbn-arxiv is downloaded from Open Graph Benchmark (OGB)3. |
| Dataset Splits | Yes | The datasets were divided consistently with Kipf & Welling (2017). For the first 6 datasets, we choose 20 nodes for each class as the training set. For the Ogbn-arxiv, because this graph is large, we choose 100 nodes for each class as the training set. From Table 14: Val / Test 500/1000 for Citeseer, Cora, Pub Med, Coauthor CS, Computers, Photo. Val / Test 29799/48603 for Ogbn-arxiv. Val / Test 59/37 for Cornell, Texas. Val / Test 80/51 for Wisconsin. From Table 17: Split / Ration 85/9/6 for ogbn-mag. |
| Hardware Specification | Yes | All experiments were conducted on an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz and NVIDIA A100 PCIe 80GB GPU. |
| Software Dependencies | Yes | The software that we use for experiments is Python 3.7.13, Py Torch 1.12.1, torch-geometric 2.1.0, torch-scatter 2.0.9, torch-sparse 0.6.15, torchvision 0.13.1, ogb 1.3.4, numpy 1.21.5 and CUDA 11.6. |
| Experiment Setup | Yes | We selected 1% of the nodes for negative sampling in each network layer. The datasets were divided consistently with Kipf & Welling (2017). Further information on the experimental setup and hyperparameters can be found in Appendix A.3. We set the maximum length of the shortest path P to 6 in Algorithm 1. The negative rate µ is a trainable parameter and is trained in all models. Each model was trained using an Adam optimiser with a learning rate of 0.02. The number of hidden channels is set to 16 for all models. Tests for each model with each dataset were conducted ten times. |