reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Layer-diverse Negative Sampling for Graph Neural Networks

Authors: Wei Duan, Jie Lu, Yu Guang Wang, Junyu Xuan

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various real-world graph datasets demonstrate the eﬀectiveness of our approach in improving the diversity of negative samples and overall learning performance. Our experiments aimed to address these questions: (1) Can the addition of negative samples obtained using our method improve the performance of GNNs compared to baseline methods? (Section 4.1) (2) Does our method result in negative samples with reduced redundancy? (Section 4.2) (3) Does our method yield consistent results even when fewer nodes are included in the negative sampling? (Section 4.2) (4) How would our negative sampling approach perform when applied to other GNNs architectures? (Section 4.3) (5) Does incorporating these negative samples into graph convolution alleviate issues with over-smoothing and over-squashing? (Section 4.4) (6) What is the time complexity of the proposed method? (Section 4.5) (7) How do the sampling results of our LDGCN model compare to those of the D2GCN model in terms of overlap reduction and sample diversity? (Section 4.6)
Researcher Affiliation	Academia	Wei Duan EMAIL Australian Artiﬁcial Intelligence Institute University of Technology Sydney Jie Lu EMAIL Australian Artiﬁcial Intelligence Institute University of Technology Sydney Yu Guang Wang EMAIL Institute of Natural Sciences School of Mathematical Sciences Shanghai Jiao Tong University Junyu Xuan EMAIL Australian Artiﬁcial Intelligence Institute University of Technology Sydney
Pseudocode	Yes	Algorithm 1: Get candidate set Si using shortest-path-based method Input : A graph G, sample length P, node i 1 Compute the shortest path lengths from i to all reachable nodes Vr; 2 Divide Vr into diﬀerent sets Vp based on the path length p; 4 for p 2 to P do 5 Randomly choose a node j in Vp; 6 Collect ﬁrst-order neighbours Nj of j; 7 Si Si Nj j; 8 end Output: Candidate set Si
Open Source Code	No	The paper mentions that the AERO model is implemented by code published on GitHub and that GCN, SAGE, GATv2, and GIN-ϵ use Py Torch Geometric to implement their convolutional layers. However, there is no explicit statement or link indicating that the authors' own code for the proposed LDGCN method is open-source or publicly available.
Open Datasets	Yes	We ﬁrst conducted our experiments with seven homophilous datasets for semi-supervised node classiﬁcation, including citation network: Citeseer, Cora and Pub Med (Sen et al., 2008), Coauthor networks: CS (Shchur et al., 2018), Amazon networks: Computers and Photo (Shchur et al., 2018), and Open Graph Benchmark: ogbn-arxiv (Hu et al., 2020). Then, we expanded our experiments to three heterophilous datasets, including Cornell, Texas, and Wisconsin Craven et al. (1998). The first six datasets are downloaded from Py Torch Geometric (Py G)2. The Ogbn-arxiv is downloaded from Open Graph Benchmark (OGB)3.
Dataset Splits	Yes	The datasets were divided consistently with Kipf & Welling (2017). For the ﬁrst 6 datasets, we choose 20 nodes for each class as the training set. For the Ogbn-arxiv, because this graph is large, we choose 100 nodes for each class as the training set. From Table 14: Val / Test 500/1000 for Citeseer, Cora, Pub Med, Coauthor CS, Computers, Photo. Val / Test 29799/48603 for Ogbn-arxiv. Val / Test 59/37 for Cornell, Texas. Val / Test 80/51 for Wisconsin. From Table 17: Split / Ration 85/9/6 for ogbn-mag.
Hardware Specification	Yes	All experiments were conducted on an Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz and NVIDIA A100 PCIe 80GB GPU.
Software Dependencies	Yes	The software that we use for experiments is Python 3.7.13, Py Torch 1.12.1, torch-geometric 2.1.0, torch-scatter 2.0.9, torch-sparse 0.6.15, torchvision 0.13.1, ogb 1.3.4, numpy 1.21.5 and CUDA 11.6.
Experiment Setup	Yes	We selected 1% of the nodes for negative sampling in each network layer. The datasets were divided consistently with Kipf & Welling (2017). Further information on the experimental setup and hyperparameters can be found in Appendix A.3. We set the maximum length of the shortest path P to 6 in Algorithm 1. The negative rate µ is a trainable parameter and is trained in all models. Each model was trained using an Adam optimiser with a learning rate of 0.02. The number of hidden channels is set to 16 for all models. Tests for each model with each dataset were conducted ten times.