reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Higher-Order Spectral Clustering Under Superimposed Stochastic Block Models

Authors: Subhadeep Paul, Olgica Milenkovic, Yuguo Chen

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then proceed to rigorously analyze the performance of a recently proposed higher-order spectral clustering method on the Sup SBM. In particular, we prove non-asymptotic upper bounds on the misclustering error of higher-order spectral community detection for a Sup SBM setting in which triangles are superimposed with undirected edges. We assess the model ﬁt of the proposed model and compare it with existing random graph models in terms of observed properties of real network data obtained from diverse domains by sampling networks from the ﬁtted models and a nonparametric network cross-validation approach.
Researcher Affiliation	Academia	Subhadeep Paul EMAIL Department of Statistics The Ohio State University Columbus, OH 43210, USA; Olgica Milenkovic EMAIL Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL 61801, USA; Yuguo Chen EMAIL Department of Statistics University of Illinois at Urbana-Champaign Champaign, IL 61820, USA
Pseudocode	No	The paper describes algorithms (e.g., spectral clustering, greedy clustering algorithm), but it refers to an external algorithm (Gao et al. (2017, Algorithm 2)) rather than providing its own pseudocode or algorithm blocks within the text.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or indicate that code is included in supplementary materials for the methodology described.
Open Datasets	Yes	The karate club data (Zachary, 1977) is a frequently used benchmark dataset for network community detection... This dataset describes an undirected social network involving 62 dolphins in Doubtful Sound, New Zealand, curated by Lusseau et al. (2003). This dataset contains the entire connectome or wiring diagram of the nervous system of a small nematode called Caenorhabditis Elegans (Chen et al., 2006; White et al., 1986; Sohn et al., 2011; Vershynin, 2010). The political blogs dataset (Adamic and Glance, 2005), collected during the 2004 US presidential election...
Dataset Splits	Yes	The s-fold network cross-validation approach of Li et al. (2020b) randomly splits the pairs of vertices in the network into s groups. The training data is formed with s 1 sets, and the remaining set is used as test data. ...For each of our datasets, we compare the SBM and Sup SBM with a 10-fold cross-validation error.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU/CPU models, processor types, or details about computational resources.
Software Dependencies	No	The paper discusses various algorithms and models (e.g., spectral clustering, k-means algorithm, SBM, Sup SBM), but it does not specify any particular software libraries or their version numbers that were used for implementation.
Experiment Setup	Yes	Spectral clustering is applied to the motif adjacency matrix in a standard form in order to ﬁnd communities of motifs. ... The algorithm subsequently performs the greedy clustering algorithm in Gao et al. (2017, Algorithm 2) on the rows of the n k matrix of eigenvectors... Once we have obtained the community assignments, the parameters can be estimated using an approximate generalized method of moments approach... We choose K as the value that minimizes the cross-validation error. ...we set the number of communities K = 2 for both SBM and Sup SBM. ...the eigenvectors are row-normalized before applying the k-means algorithm.