reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

Authors: Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive OOD detection experiments are conducted on five benchmark graph datasets, verifying the superior performance of GOLD without using real OOD data compared with the state-of-the-art OOD exposure and non-exposure baselines. 1 Table 1: Model performance comparison: out-of-distribution detection results are measured by AUROC ( ) / AUPR ( ) / FPR95 ( ) (%) and in-distribution classification results are measured by accuracy (ID ACC) ( ).
Researcher Affiliation	Academia	Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang The University of Queensland EMAIL
Pseudocode	Yes	Algorithm 1 Adversarial Optimisation of GOLD Input: ID graph G = (A, X), randomly initialised GNN, MLP detector, and latent generator D, epoch numbers M1 and M2, loss coefficients λ, µ, γ. Output: Optimised GNN, MLP detector, and generative model D 1: while train do 2: Obtain H for ID data with GNN from Eq. 7 3: for epoch = 1, . . . , M1 do // Step 1 4: Train D with LGen and H from Eq. 8 5: end for 6: Sample noise HN from Normal distribution 7: Generate pseudo-OOD Hp-OOD with D and HT from Eq. 9 8: for epoch = 1, . . . , M2 do // Step 2 9: Train GNN with LCLS and H Eq. 6 10: Train GNN and MLP with LDiv, H and Hp-OOD from Eq. 13 11: end for 12: end while
Open Source Code	Yes	1Code is available at https://github.com/Danny W618/GOLD. 2. Model training. Our implementation of the energy-based OOD detector builds upon the open-sourced work GNNSAFE by Wu et al. (2023b), https://github.com/ qitianwu/Graph OOD-GNNSafe.
Open Datasets	Yes	Following Wu et al. (2023b), five benchmark datasets are used for OOD detection evaluation, including four single-graph datasets: (1) Cora, (2) Amazon-Photo, (3) Coauthor-CS, with synthetic OOD data created via: structure manipulation, feature interpolation, and label leave-out; and (4) ogbn-Arxiv, OOD by year, and (5) one multi-graph scenario: Twitch Gamers Explicit, OOD by different graphs. Detailed splits are provided in Appendix A.6. A.6 DESCRIPTION OF DATASETS The datasets utilised in this study are publicly available benchmark datasets for graph learning. We follow the same data collection and processing protocol in Wu et al. (2023b) and utilised the data loader for the ogbn-Arxiv dataset provided by the OGB package2, and others from the Pytorch Geometric Package3. For all datasets, we follow the provided splits and generation process in Wu et al. (2023b). We provide a brief description of the datasets below: The Twitch Gamers Explicit dataset consists of multiple subgraphs, each representing a social network from a different region (Rozemberczki & Sarkar, 2021). ... The Cora dataset is a citation network where each node represents a published paper, and each edge reflects a citation relationship between papers (Sen et al., 2008). ... The Amazon-Photo dataset forms an item co-purchasing network on Amazon, where each node represents a product and each edge signifies that the linked products are frequently bought together (Mc Auley et al., 2015). ... The ogbn-Arxiv dataset curated an extensive dataset from 1960 to 2020, where each node represents a paper, labelled by its subject area for classification (Hu et al., 2020).
Dataset Splits	Yes	Detailed splits are provided in Appendix A.6. A.6 DESCRIPTION OF DATASETS For all datasets, we follow the provided splits and generation process in Wu et al. (2023b). ... We utilise subgraph DE as ID data, and subgraphs ES, FR, RU as testing data. ... Cora-L ID 904 10556 1433 3 Cora-L OOD 986 10556 1433 3 ... Arxiv-2015 ID 53160 152226 128 40 Arxiv-2018 OOD 29799 622466 128 40 Arxiv-2019 OOD 39711 1061197 128 40 Arxiv-2020 OOD 8892 1166243 128 40
Hardware Specification	Yes	A.8 IMPLEMENTATION DETAILS The experiments were conducted using Python 3.8.0 and Py Torch 2.2.2 with Cuda 12.1, using Tesla V100 GPUs with 32GB memory for experiments.
Software Dependencies	Yes	A.8 IMPLEMENTATION DETAILS The experiments were conducted using Python 3.8.0 and Py Torch 2.2.2 with Cuda 12.1, using Tesla V100 GPUs with 32GB memory for experiments.
Experiment Setup	Yes	Implementations. For a fair comparison, GCN is used as the backbone across all methods, with a layer depth of 2 and a hidden size of 64. The propagation iteration k in Eq. 5 is set to 2, and the controlling parameter α of 0.5 is used. For LDM, the timestep T is configured within {600, 800, 1000}, β1 = 10-4, and βT = 0.02. The denoising network D and the MLP detector model are implemented with varying layer and hidden dimension sizes within {2, 3} and {128, 256, 512} respectively, subject to the dataset. Additional hyperparameter analysis and parameter details are provided in Appendix A.11. We use the Adam optimizer for optimisation (Kingma & Ba, 2015). A.8 IMPLEMENTATION DETAILS Extending beyond the thresholds provided in Wu et al. (2023b), we tuned the margins t ID and t OOD with various ranges for different dataset (i.e., for Twitch t ID ∈ {−5, −4, −3}, t OOD ∈ {1, 2, 3}). The detector loss weights λ, µ, γ are tuned in the range of {0, 0.3, 0.5, 0.7, 1, 1.5}, depending on the dataset. Hyperparameter sensitivity analysis for the detector and classifier loss objective can be found in Figure 6. The LGM training step M1 is configured in the range of {100, 200, 600, 800}, and the classifier and detector update M2 is tuned from {5 − 20} subject to the dataset, with early stopping applied to ensure the ID accuracy does not reduce significantly.