Accurate Link Prediction for Edge-Incomplete Graphs via PU Learning
Authors: Junghun Kim, Ka Hyun Park, Hoyoung Yoon, U Kang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real-world datasets show that PULL consistently outperforms the baselines for predicting links in edge-incomplete graphs. [...] Experiments. Extensive experiments on real-world datasets show that PULL achieves the best performance. |
| Researcher Affiliation | Academia | Seoul National University, South Korea EMAIL |
| Pseudocode | Yes | Algorithm 1: Overall process of PULL. |
| Open Source Code | Yes | The code and datasets are available at https://github.com/snudatalab/PULL. |
| Open Datasets | Yes | We use seven real-world datasets which are summarized in Table 3b of the extended manuscript (Kim et al. 2024). Pub Med and Cora-full are citation networks where nodes correspond to scientific publications and edges signify citation relationships between the papers. Each node has binary bag-of-words features. Chameleon and Crocodile are Wikipedia networks, with nodes representing web pages and edges representing hyperlinks between them. Node features include keywords or informative nouns extracted from the pages. Facebook is a social network where nodes represent users, and edges indicate mutual likes. Node features represent user-specific information such as age and gender. The code and datasets are available at https://github.com/snudatalab/PULL. |
| Dataset Splits | Yes | The validation and test sets consist of the missing edges and an equal number of randomly sampled non-edges. We vary the ratio rm of test missing edges in {0.1, 0.2}. The ratio of valid missing edges are set to 0.1 through the experiments. [...] We balance them by randomly sampling |EP Er P| unconnected edges among Er U for every epoch. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'GCN' but does not specify exact software libraries or versions (e.g., PyTorch, TensorFlow, Python versions) that would be needed for replication. |
| Experiment Setup | Yes | For PULL, we set the number of inner loops as 200, and the maximum number of iterations as 10. The iterations stop if the current validation AUROC is smaller than that of the previous iteration. We use Adam optimizer with a learning rate of 0.01, and set the numbers of layers and hidden dimensions as 2 and 16, respectively, following the original GCN paper (2017) for fair comparison between the methods. For the hyperparameters of the baselines, we use the default settings described in their papers. We repeat the experiments ten times with different random seeds and present the results in terms of both average and standard deviation. |