Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Community detection in sparse latent space models
Authors: Fengnan Gao, Zongming Ma, Hongsong Yuan
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate numerical prowess of the method on simulated and real data examples in Sections 4 and 5, respectively. |
| Researcher Affiliation | Academia | Fengnan Gao EMAIL School of Data Science, Fudan University and Shanghai Center for Mathematical Sciences N202 Zibin, 220 Handan Road, Shanghai 200433, China Zongming Ma EMAIL Department of Statistics and Data Science University of Pennsylvania 265 South 37th Street, Philadelphia, PA 19104, USA Hongsong Yuan EMAIL Research Institute for Interdisciplinary Sciences and School of Information Management and Engineering Shanghai University of Finance and Economics 777 Guoding Road, Shanghai 200433, China |
| Pseudocode | Yes | Algorithm 1: Initialization Algorithm 2: Local Refinement Algorithm 3: A provable version of latent space model community detection method |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The first three datasets are political blog with 1222 nodes, 16714 edges, and 2 communities (Adamic and Glance, 2005), Simmons College with 1137 nodes, 24257 edges, and 4 communities and Caltech data with 590 nodes, 12822 edges, and 8 communities (Traud et al., 2011, 2012). The fourth dataset is a manufacturing company network from Cross and Parker (2004), which was studied in Weng and Feng (2022). The fifth dataset is a French high school friendship network (Mastrandrea et al., 2015). |
| Dataset Splits | No | The paper mentions data generation parameters for simulation studies (e.g., 'The nodes were split into two clusters of sizes n1 n2 500') and describes real-world datasets, but it does not specify train/test/validation splits for experimental evaluation or reproduction. |
| Hardware Specification | Yes | All reported results were obtained on a Windows 7 PC with two Intel Xeon Processors (E5-2630 v3@2.40GHz) and 64G RAM. |
| Software Dependencies | No | The paper mentions the operating system ('Windows 7') but does not specify any software libraries, frameworks, or their version numbers used for the experiments. |
| Experiment Setup | Yes | We set up model (1) with latent space dimension d 3 and size n 1000. The nodes were split into two clusters of sizes n1 n2 500. For i 1, . . . , n1, we generated i.i.d. zi Ndpµ, τ 2Idq, where µ p0.5, 1, 0q J, and for i n1 1, . . . , n, we generated i.i.d. zi Ndp µ, τ 2Idq. We varied τ P t0.75, 0.5, 0.25u. In addition, we let H diagp1, 1, 0.5q, and generated αi α ωi, where α 2.49 (so that the median degree ne2α log n) and ωi iid Np0, 1q. We compare Algorithm 1 + one-round Algorithm 2 refinement (Spec Lo Re R 1) and Algorithm 1 + ten-round Algorithm 2 refinement (Spec Lo Re R 10) to LSCD in Ma et al. (2020) (initialized by Algorithm 3 in Ma et al., 2020 followed by Algorithm 1 in Ma et al., 2020 with 800 iterations). |