Distributed Community Detection in Large Networks

Authors: Sheng Zhang, Rui Song, Wenbin Lu, Ji Zhu

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we conduct simulation studies to examine the performance of the proposed method and compare it with other community detection methods. In Section 6, we apply the proposed method to an airline route network and a Facebook ego network.
Researcher Affiliation Academia Sheng Zhang EMAIL Rui Song EMAIL Wenbin Lu EMAIL Department of Statistics North Carolina State University Raleigh, NC 27607, USA. Ji Zhu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA.
Pseudocode Yes Algorithm 1: Distributed Community Detection
Open Source Code No The paper provides links to third-party tools (e.g., https://igraph.org/r/doc/cluster fast greedy.html, https://rdrr.io/cran/mixer/, https://cran.r-project.org/web/packages/randnet/randnet.pdf, https://rdrr.io/cran/Score Plus/man/SCORE.html) that were used in their work, but does not explicitly state that the authors' own implementation code for the methodology described in this paper is publicly available or provide a link to such code.
Open Datasets Yes 1. https://openflights.org/data.html 2. https://snap.stanford.edu/data
Dataset Splits Yes Specifically, we first randomly sample a proportion of the node pairs as the testing set and mask the corresponding edge Aij values as zero; then we apply the proposed community detection method to obtain ˆBn which can be estimated according to the detected community labels ˆc by considering the connection probability of nodes both within and between communities. Lastly, we evaluate the performance of ˆBn on the testing set using the area under the receiver operating characteristic curve (AUC). The left panel in Figure 7 shows how the AUC value changes when we vary the proportion of masked node pairs. The x-axis for masked proportion is set as 0.1, 0.3, 0.5, 0.7, 0.99 respectively.
Hardware Specification Yes all experiments are conducted via a Dell R7425 machine with dual processor AMD Epyc 32 core 2.2 GHZ with 8GB of RAM
Software Dependencies No The paper mentions several R packages for implementing comparative methods (e.g., 'cluster fast greedy.html' for the fast-greedy algorithm, 'mixer' for VSBM, 'randnet' for LRBIC and SSP, 'Score Plus' for SCORE) but does not provide specific version numbers for these packages, the R environment itself, or any other comprehensive list of software dependencies required to reproduce their experiments.
Experiment Setup Yes Specifically, we generate networks using the SBM with G = 4 groups, the four groups contain 2, 3, 3 and 4 communities, respectively. The community frequencies are set as π = (1/12, 1/12, .., 1/12)T . The probability matrix Bn R12 12 is generated as follows: for nodes vi and vj belonging to the same group, Bcicj Unif(1/100, 1); for nodes vi and vj from different groups, Bcicj Unif(0, 1/100). ... Under the setting of DCSBM, we follow the network generation strategy as in SBM, except that now we have the extra degree parameter θ = (θ1, . . . , θn), and they are generated independently as follows, P(θi = mx) = P(θi = x) = 1 where we set m = 3/2 and x = 2/(m + 1) so that E(θi) = 1 is satisfied as discussed in Section 4.2.