reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning

Authors: Roman Bresson, Giannis Nikolentzos, George Panagopoulos, Michail Chatzianastasis, Jun Pang, Michalis Vazirgiannis

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on node classification, link prediction, graph classification and graph regression datasets. Our results indicate that KANs are on-par with or better than MLPs on all tasks studied in this paper. In this section, we compare all three KAGNN models against MLP-based GNNs on the following tasks: node classification, link prediction, graph classification, and graph regression.
Researcher Affiliation	Academia	Roman Bresson KTH Royal Institute of Technology, Sweden EMAIL Giannis Nikolentzos University of Peloponnese, Greece EMAIL George Panagopoulos University of Luxembourg, Luxembourg EMAIL Michail Chatzianastasis École Polytechnique, IP Paris, France EMAIL Jun Pang University of Luxembourg, Luxembourg EMAIL Michalis Vazirgiannis École Polytechnique, IP Paris, France KTH Royal Institute of Technology, Sweden EMAIL
Pseudocode	No	The paper describes the proposed KAGNN layers using mathematical formulations and textual explanations within sections 3 and 4, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code available at https://github.com/Roman Bresson/KAGNN.
Open Datasets	Yes	To evaluate the performance of GNNs with KAN layers in the context of node classification, we use 7 well-known datasets of varying sizes and types, including homophilic (Cora, Citeseer (Kipf and Welling, 2017) and Ogbn-arxiv (Hu et al., 2020)) and heterophilic (Cornell, Texas, Wisconsin, Actor) networks (Zhu et al., 2021). In this setting, we focus on the task of link prediction. We use two datasets, Cora and Cite Seer, following the implementation and protocol provided in (Li et al., 2023; Mao et al., 2024). We experiment with the following 7 datasets: MUTAG, DD, NCI1, PROTEINS, ENZYMES, IMDB-B, IMDB-M. We experiment with two molecular datasets: (1) ZINC-12K (Irwin and Shoichet, 2005), and (2) QM9 (Ramakrishnan et al., 2014).
Dataset Splits	Yes	The homophilic networks are already split into training, validation, and test sets, while the heterophilic datasets are accompanied by fixed 10-fold cross-validation indices. We perform 10-fold cross-validation, while within each fold a model is selected based on a 90%/10% split of the training set. We use the pre-defined splits provided in (Errica et al., 2020). For ZINC-12K, the dataset is already split into training, validation and test sets (10, 000, 1, 000 and 1, 000 graphs in the training, validation and test sets, respectively). For QM9, the dataset was divided into a training, a validation, and a test set according to a 80%/10%/10% split.
Hardware Specification	No	This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation and partner Swedish universities and industry. Part of the experiments was enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. Part of the experiments was also enabled by the Berzelius resource, which was made possible through application support provided by National Supercomputer Centre at Linköping University. The paper mentions general supercomputing resources (NAISS, Berzelius resource) but does not provide specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies	No	All models are implemented with Py Torch (Paszke et al., 2019). For KAN layers, we rely on publicly available implementations for B-splines 1 and for RBF 2. The paper mentions PyTorch and implementations for KAN layers but does not specify version numbers for PyTorch or any other software libraries, which is required for reproducibility.
Experiment Setup	Yes	For every dataset and model, we tune the values of the hyperparameters using the Optuna package (Akiba et al., 2019) with 100 trials (parameterizations), a TPE Sampler and set early stopping patience to 50. We use a different hyperparameter range for MLP-based models, B-Splines-based models and RBF-based models (see Table 8). We train each model for 1,000 epochs (early stopping with a patience of 20 epochs) by minimizing the cross entropy loss. We use the Adam optimizer for model training (Kingma and Ba, 2015). We set the batch size equal to 128 for all models.