Sampling random graph homomorphisms and applications to network data analysis
Authors: Hanbaek Lyu, Facundo Memoli, David Sivakoff
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide various examples and simulations demonstrating our framework through synthetic networks. We also demonstrate the performance of our framework on the tasks of network clustering and subgraph classification on the Facebook100 dataset and on Word Adjacency Networks of a set of classic novels. |
| Researcher Affiliation | Academia | Hanbaek Lyu EMAIL Department of Mathematics University of Wisconsin Madison, WI, USA 53706 Facundo Mémoli EMAIL Department of Mathematics The Ohio State University, Columbus, OH 43210, USA David Sivakoff EMAIL Department of Statistics and Department of Mathematics The Ohio State University, Columbus, OH 43210 |
| Pseudocode | Yes | Definition 2.1 (Glauber chain) Let F = ([k], AF ) be a simple motif and G = ([n], A, α) be a network. Suppose t(F, G) > 0 and fix a homomorphism x0 : F → G. Define a Markov chain xt of homomorphisms F → G as below. (i) Choose a node i ∈ [k] of F uniformly at random. (ii) Set xt+1(j) = xt(j) for j ≠ i. Update xt(i) = a to xt+1(i) = b according to the transition kernel... Definition 2.2 (Pivot chain) Let F = ([k], AF ) be a rooted tree motif and let G = ([n], A, α) be a network such that for each i ∈ [n], A(i, j) > 0 for some j ∈ [n]. Let x0 : [k] → [n] be an arbitrary homomorphism. Define a Markov chain xt of homomorphisms F → G as follows. (i) Given xt(1) = a, sample a node b ∈ [n] according to the distribution Ψ(a, ·), where the kernel Ψ : [n]2 → [0, 1] is defined by... (ii) Let π(1) denote the projection of the probability distribution πF→G (defined at (9)) onto the location of node 1. Then accept the update a → b and set xt+1(1) = b or reject the update and set xt+1(1) = a independently with probability λ or 1 − λ, respectively... |
| Open Source Code | Yes | All codes are available at https://github.com/Hanbaek Lyu/motif_sampling |
| Open Datasets | Yes | We also demonstrate the performance of our framework on the tasks of network clustering and subgraph classification on the Facebook100 dataset and on Word Adjacency Networks of a set of classic novels. ... In our experiment, we use the Facebook100 dataset, which consists of the snapshots of the Facebook network of 100 schools in the US in Sep. of 2005. Since it was first published and analyzed by Traud, Mucha, and Porter in Traud et al. (2012), it has been regarded as a standard data set in the field of social network analysis. ... The particular data set we will analyze in this section consists of the following 45 novels of the nine authors listed below: ... The frequency matrices corresponding to the above novels are recorded using a list of n = 211 function words (see supplementary material of Segarra et al. (2015)). |
| Dataset Splits | Yes | Out of the total of 200 labeled subgraphs, we use 100 (50 from each network) to train several logistic regression classifiers corresponding to the input features... Each trained logistic classifier is used to classify the remaining 100 labeled subgraphs (50 from each network). ... In order to generate the distance matrices, we partition the 45 novels into validation set and reference set of sizes 9 and 36, respectively, by randomly selecting a novel for each author. |
| Hardware Specification | Yes | We used a modest computational resource for our experiments: A quad-core 10th Gen. Intel Core i5-1035G7 Processor and 8 GB LPDDR4x RAM with Windows 10 operating system. |
| Software Dependencies | No | The paper mentions "All scripts for replicating the experiments can be obtained from our Git Hub repository https://github.com/Hanbaek Lyu/motif_sampling." While this points to where the software can be found, the paper text itself does not explicitly list specific software dependencies with their version numbers. |
| Experiment Setup | Yes | We used the chain motif F = ([21], 1{(1,2),(2,3),...,(20,21)}) of 20 edges, which we sampled from each network by Glauber chain (see Definition 2.1) for 2n log n iterations, where n denotes the number of nodes in the given network, which we denote by G. ... The problem setting is as follows. Suppose we have two networks G1 and G2, not necessarily of the same size. From each network Gi, we sample 100 connected subgraphs of 30 nodes by running a simple symmetric random walk on Gi until it visits 30 distinct nodes and then taking the induced subgraph on the sampled nodes. ... We repeated this process for 10^4 iterations to obtain a 9 × 9 × 10^4 array. The average of all 10^4 distance matrices for each of the three pairs of motifs are shown in Figure 32. ... For the classification test, we first choose k ∈ {1, 2, 3, 4} known texts and one text with unknown authorship from each author. |