reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Community detection in sparse latent space models

Authors: Fengnan Gao, Zongming Ma, Hongsong Yuan

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate numerical prowess of the method on simulated and real data examples in Sections 4 and 5, respectively.
Researcher Affiliation	Academia	Fengnan Gao EMAIL School of Data Science, Fudan University and Shanghai Center for Mathematical Sciences N202 Zibin, 220 Handan Road, Shanghai 200433, China Zongming Ma EMAIL Department of Statistics and Data Science University of Pennsylvania 265 South 37th Street, Philadelphia, PA 19104, USA Hongsong Yuan EMAIL Research Institute for Interdisciplinary Sciences and School of Information Management and Engineering Shanghai University of Finance and Economics 777 Guoding Road, Shanghai 200433, China
Pseudocode	Yes	Algorithm 1: Initialization Algorithm 2: Local Reﬁnement Algorithm 3: A provable version of latent space model community detection method
Open Source Code	No	The paper does not contain any explicit statement about releasing code for the methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	The first three datasets are political blog with 1222 nodes, 16714 edges, and 2 communities (Adamic and Glance, 2005), Simmons College with 1137 nodes, 24257 edges, and 4 communities and Caltech data with 590 nodes, 12822 edges, and 8 communities (Traud et al., 2011, 2012). The fourth dataset is a manufacturing company network from Cross and Parker (2004), which was studied in Weng and Feng (2022). The fifth dataset is a French high school friendship network (Mastrandrea et al., 2015).
Dataset Splits	No	The paper mentions data generation parameters for simulation studies (e.g., 'The nodes were split into two clusters of sizes n1 n2 500') and describes real-world datasets, but it does not specify train/test/validation splits for experimental evaluation or reproduction.
Hardware Specification	Yes	All reported results were obtained on a Windows 7 PC with two Intel Xeon Processors (E5-2630 v3@2.40GHz) and 64G RAM.
Software Dependencies	No	The paper mentions the operating system ('Windows 7') but does not specify any software libraries, frameworks, or their version numbers used for the experiments.
Experiment Setup	Yes	We set up model (1) with latent space dimension d 3 and size n 1000. The nodes were split into two clusters of sizes n1 n2 500. For i 1, . . . , n1, we generated i.i.d. zi Ndpµ, τ 2Idq, where µ p0.5, 1, 0q J, and for i n1 1, . . . , n, we generated i.i.d. zi Ndp µ, τ 2Idq. We varied τ P t0.75, 0.5, 0.25u. In addition, we let H diagp1, 1, 0.5q, and generated αi α ωi, where α 2.49 (so that the median degree ne2α log n) and ωi iid Np0, 1q. We compare Algorithm 1 + one-round Algorithm 2 reﬁnement (Spec Lo Re R 1) and Algorithm 1 + ten-round Algorithm 2 reﬁnement (Spec Lo Re R 10) to LSCD in Ma et al. (2020) (initialized by Algorithm 3 in Ma et al., 2020 followed by Algorithm 1 in Ma et al., 2020 with 800 iterations).