reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

[RE] GNNBoundary: Finding Boundaries and Going Beyond Them

Authors: Jan Henrik Bertrand, Lukas Bierling, Ina Klaric, Aron Wezenberg

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study aims to reproduce the findings of GNNBoundary: Towards Explaining Graph Neural Networks Through the Lens of Decision Boundaries (Wang & Shen, 2024). Their work supports 3 main claims: (1) their proposed algorithm can identify adjacent class pairs reliably, (2) their GNNBoundary can effectively and consistently generate near-boundary graphs outperforming the cross-entropy baseline, and (3) the generated near-boundary graphs can be used to accurately assess key properties of the decision boundary; margin, thickness, and complexity. We reproduced the experiments on the same datasets and extended them to two additional real-world datasets. Beyond that, we test different boundary probability ranges and their effect on decision boundary metrics, develop an additional baseline, and perform hyperparameter tuning.
Researcher Affiliation	Academia	Jan Henrik Bertrand * EMAIL University of Amsterdam Lukas Bierling * EMAIL University of Amsterdam Ina Klaric * EMAIL University of Amsterdam Aron Wezenberg EMAIL University of Amsterdam
Pseudocode	No	The paper describes the algorithms and methods using mathematical notation and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Code and instructions are available at: https://github.com/jhb300/re_gnnboundary.
Open Datasets	Yes	Motif (Wang & Shen (2023)), Collab (Yanardag & Vishwanathan (2015)), Enzymes (Schomburg et al. (2004)), and IMDB (Yanardag & Vishwanathan (2015)).
Dataset Splits	No	The paper discusses experiments on various datasets (Motif, Collab, Enzymes, IMDB, Reddit) and refers to 'sampling a graph from the dataset for each class' but does not specify the training, validation, and test splits (e.g., percentages or counts) used for these datasets. It only implies using labeled training datasets for baseline comparisons.
Hardware Specification	Yes	However, such a large graph has up to n(n 1) 2 undirected edges, which for 550 nodes would be up to 150.975 edges. Hence, the time and memory complexity of GNNBoundary is O(n2) w.r.t. to the number of nodes n, leading to training times of around 330 minutes for a single boundary graph sampler on an Apple M3 chip.
Software Dependencies	No	The paper mentions several software components and libraries like 'GNNInterpreter', 'UMAP', and 'PCA', and also mentions an 'environment.yml and a pyproject.toml file' which facilitate setup, but it does not provide specific version numbers for any of these software dependencies within the text of the paper.
Experiment Setup	Yes	The search space consists of the sample size K, the target size, the target probabilities, the learning rate, the temperature, and the weight budget increase for the dynamic regularization scheduler and the weight budget decrease. Details on the search space are given in appendix H. Moreover, we employed a simple custom loss for the hyperparameter tuning, being the average deviation of the class probabilities from the target 0.5... Table 8: Hyperparameter optimization results. The search space includes the sample size K, the target size, the target probabilities, the learning rate, the temperature, the weight budget increase for the dynamic regularization scheduler and the weight budget decrease. HPO for the Reddit dataset was not possible under the given configurations due to the high graph size (cf. section 6.3).