reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Addressing caveats of neural persistence with deep graph persistence

Authors: Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To study NP of neural networks and validate our insights in practise, we train a larg set of models. Namely, we train DNNs with exhaustive combinations of the following hyperparameters: Number of layers {2, 3, 4}, hidden size {50, 100, 250, 650}, and activation function {tanh, relu}. [...] For each combination of hyperparameters and dataset, we train 20 models, each with different initialisation and minibatch trajectory. If not stated otherwise, we analyse the models after training for 40 epochs. [...] In Table 1, we show covariate shift detection results for various corruptions and datasets.
Researcher Affiliation	Academia	Leander Girrbach EMAIL University of Tübingen, Tübingen AI Center Anders Christensen EMAIL Technical University of Denmark University of Tübingen, Tübingen AI Center Ole Winther EMAIL Technical University of Denmark University of Copenhagen Copenhagen University Hospital Find Zebra Zeynep Akata EMAIL University of Tübingen, Tübingen AI Center A. Sophia Koepke EMAIL University of Tübingen, Tübingen AI Center
Pseudocode	Yes	In Algorithm 1, we show pseudocode for calculating the summary matrix S needed for deep graph persistence (see Definition 5.1). [...] Algorithm 1: Algorithm for calculating the summary matrix S for deep graph persistence
Open Source Code	Yes	Code is available at https://github.com/Explainable ML/Deep-Graph-Persistence.
Open Datasets	Yes	We train models on three datasets, namely MNIST (Le Cun et al., 1998), EMNIST (Cohen et al., 2017), and Fashion-MNIST (Xiao et al., 2017). [...] To evaluate the different methods on the covariate shift detection task, we train MLP models on MNIST, Fashion-MNIST and CIFAR-10.
Dataset Splits	Yes	Each model is trained for 40 epochs with batch size 32, and we keep a checkpoint after every quarter epoch. [...] Given k labeled training samples {xi, yi}i=1,...l (in our case, l = 1000) and feature extraction function f : Rk Rd that extracts MST weights (or other representations such as the vector of softmax outputs) [...] Additionally, we create samples with varying ratio δ of corrupted samples, while all other images in the sample are non-corrupted. We include corrupted image ratios of δ {0.25, 0.5, 0.75}.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU models, or cloud computing instances) for running its experiments.
Software Dependencies	No	The paper mentions the Adam optimizer and the scipy.stats.kendalltau implementation, but it does not provide specific version numbers for any software libraries or dependencies. For example, it mentions Adam (Kingma & Ba, 2015) but not a version of a specific library implementing it.
Experiment Setup	Yes	Namely, we train DNNs with exhaustive combinations of the following hyperparameters: Number of layers {2, 3, 4}, hidden size {50, 100, 250, 650}, and activation function {tanh, relu}. [...] We use the Adam optimizer (Kingma & Ba, 2015) with the same hyperparameters as Rieck et al., i.e. with a learning rate of 0.003, no weight decay, β1 = 0.9, β2 = 0.999, and ϵ = 10 8. Each model is trained for 40 epochs with batch size 32