reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flow-Attentional Graph Neural Networks

Authors: Pascal Plettenberg, Dominik Köhler, Bernhard Sick, Josephine Thomas

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on two flow graph datasets electronic circuits and power grids we demonstrate that flow attention enhances the performance of attention-based GNNs on both graph-level classification and regression tasks. 1
Researcher Affiliation	Academia	Pascal Plettenberg EMAIL Intelligent Embedded Systems, University of Kassel Dominik Köhler EMAIL Intelligent Embedded Systems, University of Kassel Bernhard Sick EMAIL Intelligent Embedded Systems, University of Kassel Josephine M. Thomas EMAIL GAIN Group, Institute of Data Science, University of Greifswald
Pseudocode	No	The paper does not contain explicit pseudocode or algorithm blocks. It defines equations for its proposed mechanisms (e.g., Eq. 8, 9, 12-17) but these are not formatted as pseudocode.
Open Source Code	Yes	1The code is available at https://github.com/pasplett/Flow GNN.
Open Datasets	Yes	Dataset. We use the publicly available power grid data from the Power Graph benchmark dataset (Varbella et al., 2024), which encompasses the IEEE24, IEEE39, IEEE118, and UK transmission systems. Dataset. We utilize the Ckt-Bench101 dataset from the publicly available Open Circuit Benchmark (OCB) (Dong et al., 2023) Experiments on Standard Graph Datasets: Cora (Mc Callum et al., 2000), Cite Seer (Sen et al., 2008), and Pub Med (Namata et al., 2012), as well as graph classification on the molecular property prediction dataset ogbg-molhiv (Hu et al., 2020).
Dataset Splits	Yes	We stick closely to the original benchmark setting in Varbella et al. (2024) by splitting the datasets into train/validation/test with ratios 85/5/10% and using the Adam optimizer (Kingma, 2014) with an initial learning rate of 10 3 as well as a scheduler that reduces the learning rate by a factor of five if the validation accuracy plateaus for ten epochs. We split the dataset into train/validation/test with ratios 80/10/10% and select the same test set as in Dong et al. (2023).
Hardware Specification	Yes	All efficiency experiments were carried out on NVIDA V100 GPUs.
Software Dependencies	No	The paper mentions software like "Adam optimizer" and "AdamW optimizer" as well as the "Optuna" framework, but does not provide specific version numbers for any software libraries or dependencies. For example, it does not state "PyTorch 1.9" or "Python 3.8".
Experiment Setup	Yes	For each model, we perform a hyperparameter optimization by varying the number of message-passing layers (1, 2, 3) and the hidden dimension (8, 16, 32). Between subsequent message-passing layers, we apply the ReLU activation function followed by a dropout of 10%. We stick closely to the original benchmark setting in Varbella et al. (2024) by splitting the datasets into train/validation/test with ratios 85/5/10% and using the Adam optimizer (Kingma, 2014) with an initial learning rate of 10 3 as well as a scheduler that reduces the learning rate by a factor of five if the validation accuracy plateaus for ten epochs. The negative log-likelihood is used as the loss function, and balanced accuracy (Brodersen et al., 2010) is used as the primary evaluation metric due to the strong class imbalance (see App. F). We train all models with a batch size of 16 for a maximum number of 500 epochs and apply early stopping with a patience of 20 epochs. Each training run is repeated five times with different random seeds. For Flow DAGNN, we use two layers as described in Sec. 5.3 (each comprising one reverse and one forward pass) and adopt all other model parameters from DAGNN. The final prediction is done using a two-layer perceptron with a ReLU activation in between. Right before these final layers, we apply a dropout of 50% for regularization. Furthermore, we use the AdamW optimizer (Loshchilov, 2017) with an initial learning rate of 10 4 and train each model using the mean squared error (MSE) as the loss function with a batch size of 64 for a maximum of 500 epochs, but apply early stopping with a patience of 20 epochs. Each training run is repeated ten times with different random seeds.