reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantifying Network Similarity using Graph Cumulants

Authors: Gecia Bravo-Hermsdorff, Lee M. Gunderson, Pierre-André Maugis, Carey E. Priebe

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate via theory, simulation, and application to real data the superior statistical power of using graph cumulants. In summary, when analyzing data using subgraph/motif densities, we suggest using the corresponding graph cumulants instead.
Researcher Affiliation	Collaboration	Gecia Bravo-Hermsdorﬀ EMAIL Department of Statistical Sciences University College London London, WC1E 7H8, United Kingdom; Lee M. Gunderson EMAIL Gatsby Computational Neuroscience Unit University College London London, W1T 4JG, United Kingdom; Pierre-Andr e Maugis EMAIL Google Research Z urich, 8002, Switzerland; Carey E. Priebe EMAIL Department of Applied Mathematics and Statistics Whiting School of Engineering Johns Hopkins University Baltimore, MD 21218, USA
Pseudocode	No	The paper describes the steps of the statistical test in prose, for example, in Section 6.1: '1. choose r, the maximum order of the subgraph statistics being considered; 2. for each sample, estimate its distribution using these subgraph statistics; 3. quantify the diﬀerence between distributions using a notion of distance for the space of these subgraph statistics.' However, it does not present these steps in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories.
Open Datasets	Yes	In particular, we use gene interaction networks from four diﬀerent species: Mouse, Rat, Human, and Arabidopsis (a small ﬂowering plant related to cabbage and mustard), all curated by the Fun Coup repository (Persson et al., 2021).
Dataset Splits	No	The paper describes how synthetic data are generated as 'two samples, GA and GB, each containing s graphs sampled i.i.d. from unknown distributions GA and GB, respectively'. For real data, it states 'Each sample contains s graphs, all obtained from one of these adjusted networks by subsampling its nodes.' This describes data generation for the comparison, not specific training/test/validation splits for a machine learning model.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers for replication.
Experiment Setup	No	The paper describes the parameters of the statistical tests and data generation (e.g., 'choose r, the maximum order of the subgraph statistics being considered', 'Stochastic Block Models (SBMs) with two equal-sized communities and expected edge density ρ', 's graphs per sample', 'n nodes'). However, it does not include typical machine learning hyperparameters like learning rates, batch sizes, or optimizer settings, nor system-level training configurations, as the paper focuses on statistical tests rather than training models.