Significance-based community detection in weighted networks

Authors: John Palowitch, Shankar Bhamidi, Andrew B. Nobel

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive simulations, we show that the accuracy of our proposed extraction method is highly competitive with other community detection approaches on weighted networks with both disjoint and overlapping communities, and on weighted networks with background nodes. To further validate the method, we apply it to two real data sets with (arguably) potential overlapping communities and background nodes. [...] This section contains a performance analysis of CCME and existing methods on a benchmarking simulation framework.
Researcher Affiliation Academia John Palowitch EMAIL Shankar Bhamidi EMAIL Andrew B. Nobel EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill, NC 27599
Pseudocode Yes Core update Uα 1. Given: graph G with nodes N and input set B N 2. Calculate p-values p := {p(u, B, G) : u N} 3. Obtain threshold τ(p) from a multiple-testing procedure 4. Output set B = {u : p(u, B, G) τ(p)} [...] Stable community search (SCS) algorithm 1. Given weighted graph G with nodes N and initial set B0 N; t 0 2. Set Bt+1 Uα(Bt, G) and t t + 1. 3. If Bt = Bt for some t < t, terminate. Else, return to step 2.
Open Source Code Yes The R code for the CCME method is available in the github repository jpalowitch/CCME . The code for reproducing the analyses in Sections 6 and 7 is available at the github repository jpalowitch/CCME analyses .
Open Datasets Yes The first application involves commercial airline flight data, obtained from the Bureau of Transportation Statistics (www.transtats.bts.gov). [...] An email corpus from the company ENRON was made available in 2009. The unweighted network formed by linking communicating email addresses is well-studied; see www.cs.cmu.edu/~./enron for references and Leskovec et al. (2010) for the data.
Dataset Splits No The paper describes simulation parameters and how networks were generated, including that "At each unique parameter setting, 20 random networks were simulated," but it does not provide explicit training/test/validation dataset splits for reproducibility of a learning experiment on a fixed dataset. For the real-world datasets, the methods were applied to the entire networks without specifying such splits.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'The R code for the CCME method' and 'igraph' R package but does not specify version numbers for R or any other key software dependencies.
Experiment Setup Yes The parameter α, used in the set update operation Uα, must be specified by the user of CCME. Having a natural interpretation as the false-discovery rate for each update, α was set to 0.05 for all simulations and real data analyses introduced in this paper. [...] The following choices for parameters were made regardless of the simulation setting: τ2 = 2, k = n, kmax = 3k (three settings which make the degree/strength distributions skewed and the network sparse), β = 0.5 (to induce a non-trivial power law between strengths and degrees), τ1 = 1, mmin = n/5, mmax = 3mmax/2 (settings which produce between about 3 and 7 communities per network with skewed size distribution), and σ2 = 1/2.