reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Are Population Graphs Really as Powerful as Believed?

Authors: Tamara T. Müller, Sophie Starck, Kyriaki-Margarita Bintsi, Alexander Ziller, Rickmer Braren, Georgios Kaissis, Daniel Rueckert

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	However, in this work, we raise the question of whether existing methods are really strong enough by showing that simple baseline methods such as random forests or linear regressions , perform on par with advanced graph learning models on several population graph datasets for a variety of different clinical applications. We use the commonly used public population graph datasets TADPOLE and ABIDE, a brain age estimation and a cardiac dataset from the UK Biobank, and a real-world in-house COVID dataset. We (a) investigate the impact of different graph construction methods, graph convolutions, and dataset size and complexity on GNN performance and (b) discuss the utility of GNNs for multi-modal data integration in the context of population graphs.
Researcher Affiliation	Academia	1AI in Medicine and Healthcare, Technical University of Munich, Germany 2Bio Med IA, Imperial College London, UK 3Institute for Diagnostic and Interventional Radiology, Technical University of Munich, Germany 4Machine Learning in Biomedical Imaging, Helmholtz Munich, Germany Contact: EMAIL
Pseudocode	No	The paper describes algorithms and methods using mathematical definitions and textual descriptions, but does not contain any structured pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	1The source code for this work can be found at: https://github.com/tamaramueller/population_graphs
Open Datasets	Yes	First, we use the commonly used subset of the TADPOLE dataset (Yu et al., 2020) that is, for example, used in Kazi et al. (2022). A second public and frequently used dataset for population graph studies is the Autism Brain Imaging Data Exchange (ABIDE) dataset (Di Martino et al., 2014). Furthermore, we use a small real-world medical dataset of COVID patients that has also been used before in population graph settings (Keicher et al., 2021); however, in a slightly different version of the dataset. Additionally, we use a larger population graph dataset from the UK Biobank (UKBB) (Sudlow et al., 2015). In order to evaluate the impact of the graph construction method and the resulting graph structure on the performance of the GNN, we also utilise three benchmark citation datasets: CORA, CITESEER, and PUBMED (Yang et al., 2016).
Dataset Splits	Yes	Table 1: Overview of all utilised population graph datasets with the respective number of nodes, number of samples/nodes in the train, test, and validation sets, the number of node features (Nr. features), and the number of classes. Dataset Nr. nodes Train samples Val. samples Test samples Nr. features Nr. classes TADPOLE 564 468 48 57 30 3 ABIDE 871 609 41 221 6105 2 UKBB cardiac 2900 2320 58 522 89 2 COVID 65 45 4 16 29 2 UKBB brain age 6406 4811 1276 319 88 Regression
Hardware Specification	Yes	All trainings are performed on an Nvidia Quadro RTX 8000 GPU, using Pytorch lightning and Pytorch Geometric (Fey & Lenssen, 2019).
Software Dependencies	No	The paper mentions using Pytorch lightning and Pytorch Geometric, and scikit-learn, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We define a fixed set of hyperparameters for all experiments and run a hyperparameter search for at least 200 runs using sweeps from Weights and Biases (Biewald, 2020). We then pick the run with the best validation accuracy/MAE, evaluate its performance over 5 random seeds, and report the mean test accuracy with the standard deviation. All trainings are performed on an Nvidia Quadro RTX 8000 GPU, using Pytorch lightning and Pytorch Geometric (Fey & Lenssen, 2019). The hyperparameters can be found in the appendix. (Referring to Appendix B, Table 10 which lists hyperparameter ranges for sweeps like Learning rate, Dropout, k, Nr. layers, Hidden channels etc.)