Quantifying Network Similarity using Graph Cumulants
Authors: Gecia Bravo-Hermsdorff, Lee M. Gunderson, Pierre-André Maugis, Carey E. Priebe
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate via theory, simulation, and application to real data the superior statistical power of using graph cumulants. In summary, when analyzing data using subgraph/motif densities, we suggest using the corresponding graph cumulants instead. |
| Researcher Affiliation | Collaboration | Gecia Bravo-Hermsdorff EMAIL Department of Statistical Sciences University College London London, WC1E 7H8, United Kingdom; Lee M. Gunderson EMAIL Gatsby Computational Neuroscience Unit University College London London, W1T 4JG, United Kingdom; Pierre-Andr e Maugis EMAIL Google Research Z urich, 8002, Switzerland; Carey E. Priebe EMAIL Department of Applied Mathematics and Statistics Whiting School of Engineering Johns Hopkins University Baltimore, MD 21218, USA |
| Pseudocode | No | The paper describes the steps of the statistical test in prose, for example, in Section 6.1: '1. choose r, the maximum order of the subgraph statistics being considered; 2. for each sample, estimate its distribution using these subgraph statistics; 3. quantify the difference between distributions using a notion of distance for the space of these subgraph statistics.' However, it does not present these steps in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories. |
| Open Datasets | Yes | In particular, we use gene interaction networks from four different species: Mouse, Rat, Human, and Arabidopsis (a small flowering plant related to cabbage and mustard), all curated by the Fun Coup repository (Persson et al., 2021). |
| Dataset Splits | No | The paper describes how synthetic data are generated as 'two samples, GA and GB, each containing s graphs sampled i.i.d. from unknown distributions GA and GB, respectively'. For real data, it states 'Each sample contains s graphs, all obtained from one of these adjusted networks by subsampling its nodes.' This describes data generation for the comparison, not specific training/test/validation splits for a machine learning model. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers for replication. |
| Experiment Setup | No | The paper describes the parameters of the statistical tests and data generation (e.g., 'choose r, the maximum order of the subgraph statistics being considered', 'Stochastic Block Models (SBMs) with two equal-sized communities and expected edge density ρ', 's graphs per sample', 'n nodes'). However, it does not include typical machine learning hyperparameters like learning rates, batch sizes, or optimizer settings, nor system-level training configurations, as the paper focuses on statistical tests rather than training models. |