reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Significance-based community detection in weighted networks

Authors: John Palowitch, Shankar Bhamidi, Andrew B. Nobel

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive simulations, we show that the accuracy of our proposed extraction method is highly competitive with other community detection approaches on weighted networks with both disjoint and overlapping communities, and on weighted networks with background nodes. To further validate the method, we apply it to two real data sets with (arguably) potential overlapping communities and background nodes. [...] This section contains a performance analysis of CCME and existing methods on a benchmarking simulation framework.
Researcher Affiliation	Academia	John Palowitch EMAIL Shankar Bhamidi EMAIL Andrew B. Nobel EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill, NC 27599
Pseudocode	Yes	Core update Uα 1. Given: graph G with nodes N and input set B N 2. Calculate p-values p := {p(u, B, G) : u N} 3. Obtain threshold τ(p) from a multiple-testing procedure 4. Output set B = {u : p(u, B, G) τ(p)} [...] Stable community search (SCS) algorithm 1. Given weighted graph G with nodes N and initial set B0 N; t 0 2. Set Bt+1 Uα(Bt, G) and t t + 1. 3. If Bt = Bt for some t < t, terminate. Else, return to step 2.
Open Source Code	Yes	The R code for the CCME method is available in the github repository jpalowitch/CCME . The code for reproducing the analyses in Sections 6 and 7 is available at the github repository jpalowitch/CCME analyses .
Open Datasets	Yes	The ﬁrst application involves commercial airline ﬂight data, obtained from the Bureau of Transportation Statistics (www.transtats.bts.gov). [...] An email corpus from the company ENRON was made available in 2009. The unweighted network formed by linking communicating email addresses is well-studied; see www.cs.cmu.edu/~./enron for references and Leskovec et al. (2010) for the data.
Dataset Splits	No	The paper describes simulation parameters and how networks were generated, including that "At each unique parameter setting, 20 random networks were simulated," but it does not provide explicit training/test/validation dataset splits for reproducibility of a learning experiment on a fixed dataset. For the real-world datasets, the methods were applied to the entire networks without specifying such splits.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'The R code for the CCME method' and 'igraph' R package but does not specify version numbers for R or any other key software dependencies.
Experiment Setup	Yes	The parameter α, used in the set update operation Uα, must be speciﬁed by the user of CCME. Having a natural interpretation as the false-discovery rate for each update, α was set to 0.05 for all simulations and real data analyses introduced in this paper. [...] The following choices for parameters were made regardless of the simulation setting: τ2 = 2, k = n, kmax = 3k (three settings which make the degree/strength distributions skewed and the network sparse), β = 0.5 (to induce a non-trivial power law between strengths and degrees), τ1 = 1, mmin = n/5, mmax = 3mmax/2 (settings which produce between about 3 and 7 communities per network with skewed size distribution), and σ2 = 1/2.