Higher-Order Spectral Clustering Under Superimposed Stochastic Block Models

Authors: Subhadeep Paul, Olgica Milenkovic, Yuguo Chen

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then proceed to rigorously analyze the performance of a recently proposed higher-order spectral clustering method on the Sup SBM. In particular, we prove non-asymptotic upper bounds on the misclustering error of higher-order spectral community detection for a Sup SBM setting in which triangles are superimposed with undirected edges. We assess the model fit of the proposed model and compare it with existing random graph models in terms of observed properties of real network data obtained from diverse domains by sampling networks from the fitted models and a nonparametric network cross-validation approach.
Researcher Affiliation Academia Subhadeep Paul EMAIL Department of Statistics The Ohio State University Columbus, OH 43210, USA; Olgica Milenkovic EMAIL Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL 61801, USA; Yuguo Chen EMAIL Department of Statistics University of Illinois at Urbana-Champaign Champaign, IL 61820, USA
Pseudocode No The paper describes algorithms (e.g., spectral clustering, greedy clustering algorithm), but it refers to an external algorithm (Gao et al. (2017, Algorithm 2)) rather than providing its own pseudocode or algorithm blocks within the text.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or indicate that code is included in supplementary materials for the methodology described.
Open Datasets Yes The karate club data (Zachary, 1977) is a frequently used benchmark dataset for network community detection... This dataset describes an undirected social network involving 62 dolphins in Doubtful Sound, New Zealand, curated by Lusseau et al. (2003). This dataset contains the entire connectome or wiring diagram of the nervous system of a small nematode called Caenorhabditis Elegans (Chen et al., 2006; White et al., 1986; Sohn et al., 2011; Vershynin, 2010). The political blogs dataset (Adamic and Glance, 2005), collected during the 2004 US presidential election...
Dataset Splits Yes The s-fold network cross-validation approach of Li et al. (2020b) randomly splits the pairs of vertices in the network into s groups. The training data is formed with s 1 sets, and the remaining set is used as test data. ...For each of our datasets, we compare the SBM and Sup SBM with a 10-fold cross-validation error.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU/CPU models, processor types, or details about computational resources.
Software Dependencies No The paper discusses various algorithms and models (e.g., spectral clustering, k-means algorithm, SBM, Sup SBM), but it does not specify any particular software libraries or their version numbers that were used for implementation.
Experiment Setup Yes Spectral clustering is applied to the motif adjacency matrix in a standard form in order to find communities of motifs. ... The algorithm subsequently performs the greedy clustering algorithm in Gao et al. (2017, Algorithm 2) on the rows of the n k matrix of eigenvectors... Once we have obtained the community assignments, the parameters can be estimated using an approximate generalized method of moments approach... We choose K as the value that minimizes the cross-validation error. ...we set the number of communities K = 2 for both SBM and Sup SBM. ...the eigenvectors are row-normalized before applying the k-means algorithm.