reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Community Detection in Large Networks

Authors: Sheng Zhang, Rui Song, Wenbin Lu, Ji Zhu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we conduct simulation studies to examine the performance of the proposed method and compare it with other community detection methods. In Section 6, we apply the proposed method to an airline route network and a Facebook ego network.
Researcher Affiliation	Academia	Sheng Zhang EMAIL Rui Song EMAIL Wenbin Lu EMAIL Department of Statistics North Carolina State University Raleigh, NC 27607, USA. Ji Zhu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA.
Pseudocode	Yes	Algorithm 1: Distributed Community Detection
Open Source Code	No	The paper provides links to third-party tools (e.g., https://igraph.org/r/doc/cluster fast greedy.html, https://rdrr.io/cran/mixer/, https://cran.r-project.org/web/packages/randnet/randnet.pdf, https://rdrr.io/cran/Score Plus/man/SCORE.html) that were used in their work, but does not explicitly state that the authors' own implementation code for the methodology described in this paper is publicly available or provide a link to such code.
Open Datasets	Yes	1. https://openflights.org/data.html 2. https://snap.stanford.edu/data
Dataset Splits	Yes	Speciﬁcally, we ﬁrst randomly sample a proportion of the node pairs as the testing set and mask the corresponding edge Aij values as zero; then we apply the proposed community detection method to obtain ˆBn which can be estimated according to the detected community labels ˆc by considering the connection probability of nodes both within and between communities. Lastly, we evaluate the performance of ˆBn on the testing set using the area under the receiver operating characteristic curve (AUC). The left panel in Figure 7 shows how the AUC value changes when we vary the proportion of masked node pairs. The x-axis for masked proportion is set as 0.1, 0.3, 0.5, 0.7, 0.99 respectively.
Hardware Specification	Yes	all experiments are conducted via a Dell R7425 machine with dual processor AMD Epyc 32 core 2.2 GHZ with 8GB of RAM
Software Dependencies	No	The paper mentions several R packages for implementing comparative methods (e.g., 'cluster fast greedy.html' for the fast-greedy algorithm, 'mixer' for VSBM, 'randnet' for LRBIC and SSP, 'Score Plus' for SCORE) but does not provide specific version numbers for these packages, the R environment itself, or any other comprehensive list of software dependencies required to reproduce their experiments.
Experiment Setup	Yes	Speciﬁcally, we generate networks using the SBM with G = 4 groups, the four groups contain 2, 3, 3 and 4 communities, respectively. The community frequencies are set as π = (1/12, 1/12, .., 1/12)T . The probability matrix Bn R12 12 is generated as follows: for nodes vi and vj belonging to the same group, Bcicj Unif(1/100, 1); for nodes vi and vj from diﬀerent groups, Bcicj Unif(0, 1/100). ... Under the setting of DCSBM, we follow the network generation strategy as in SBM, except that now we have the extra degree parameter θ = (θ1, . . . , θn), and they are generated independently as follows, P(θi = mx) = P(θi = x) = 1 where we set m = 3/2 and x = 2/(m + 1) so that E(θi) = 1 is satisﬁed as discussed in Section 4.2.