Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Community detection in sparse latent space models

Authors: Fengnan Gao, Zongming Ma, Hongsong Yuan

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate numerical prowess of the method on simulated and real data examples in Sections 4 and 5, respectively.
Researcher Affiliation Academia Fengnan Gao EMAIL School of Data Science, Fudan University and Shanghai Center for Mathematical Sciences N202 Zibin, 220 Handan Road, Shanghai 200433, China Zongming Ma EMAIL Department of Statistics and Data Science University of Pennsylvania 265 South 37th Street, Philadelphia, PA 19104, USA Hongsong Yuan EMAIL Research Institute for Interdisciplinary Sciences and School of Information Management and Engineering Shanghai University of Finance and Economics 777 Guoding Road, Shanghai 200433, China
Pseudocode Yes Algorithm 1: Initialization Algorithm 2: Local Refinement Algorithm 3: A provable version of latent space model community detection method
Open Source Code No The paper does not contain any explicit statement about releasing code for the methodology, nor does it provide a link to a code repository.
Open Datasets Yes The first three datasets are political blog with 1222 nodes, 16714 edges, and 2 communities (Adamic and Glance, 2005), Simmons College with 1137 nodes, 24257 edges, and 4 communities and Caltech data with 590 nodes, 12822 edges, and 8 communities (Traud et al., 2011, 2012). The fourth dataset is a manufacturing company network from Cross and Parker (2004), which was studied in Weng and Feng (2022). The fifth dataset is a French high school friendship network (Mastrandrea et al., 2015).
Dataset Splits No The paper mentions data generation parameters for simulation studies (e.g., 'The nodes were split into two clusters of sizes n1 n2 500') and describes real-world datasets, but it does not specify train/test/validation splits for experimental evaluation or reproduction.
Hardware Specification Yes All reported results were obtained on a Windows 7 PC with two Intel Xeon Processors (E5-2630 v3@2.40GHz) and 64G RAM.
Software Dependencies No The paper mentions the operating system ('Windows 7') but does not specify any software libraries, frameworks, or their version numbers used for the experiments.
Experiment Setup Yes We set up model (1) with latent space dimension d 3 and size n 1000. The nodes were split into two clusters of sizes n1 n2 500. For i 1, . . . , n1, we generated i.i.d. zi Ndpµ, τ 2Idq, where µ p0.5, 1, 0q J, and for i n1 1, . . . , n, we generated i.i.d. zi Ndp µ, τ 2Idq. We varied τ P t0.75, 0.5, 0.25u. In addition, we let H diagp1, 1, 0.5q, and generated αi α ωi, where α 2.49 (so that the median degree ne2α log n) and ωi iid Np0, 1q. We compare Algorithm 1 + one-round Algorithm 2 refinement (Spec Lo Re R 1) and Algorithm 1 + ten-round Algorithm 2 refinement (Spec Lo Re R 10) to LSCD in Ma et al. (2020) (initialized by Algorithm 3 in Ma et al., 2020 followed by Algorithm 1 in Ma et al., 2020 with 800 iterations).