Community Detection and Stochastic Block Models: Recent Developments
Authors: Emmanuel Abbe
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This note surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational thresholds, and for various recovery requirements such as exact, partial and weak recovery (a.k.a., detection). The main results discussed are the phase transitions for exact recovery at the Chernoff-Hellinger threshold, the phase transition for weak recovery at the Kesten-Stigum threshold, the optimal distortion-SNR tradeofffor partial recovery, the learning of the SBM parameters and the gap between information-theoretic and computational thresholds. The note also covers some of the algorithms developed in the quest of achieving the limits, in particular two-round algorithms via graph-splitting, semi-definite programming, linearized belief propagation, classical and nonbacktracking spectral methods. |
| Researcher Affiliation | Academia | Emmanuel Abbe EMAIL Program in Applied and Computational Mathematics and Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA |
| Pseudocode | Yes | Nonbacktracking eigenvector extraction algorithm Krzakala et al. (2013); Bordenave et al. (2015). Input: An n-vertex graph g and a parameter τ R. (1) Construct the nonbacktracking matrix B of the graph g. (2) Extract the eigenvector ξ2 corresponding to the second largest eigenvalue of B. (3) Assign vertex v to the first community if P e:e2=v ξ2(e) > τ/ n and to the second community otherwise. |
| Open Source Code | No | The paper describes various algorithms and methods but does not provide any specific link or statement indicating that the source code for the implementations described in the paper is openly available. The license information provided refers to the paper itself, not its code. |
| Open Datasets | Yes | We use the blogosphere data set from the 2004 US political elections Adamic and Glance (2005) as an archetype example. |
| Dataset Splits | No | The paper mentions the use of a dataset (blogosphere data set from Adamic and Glance (2005)) to illustrate the impact of the developed theory. However, it does not describe any specific training, validation, or test splits for this dataset within the context of experiments conducted in this paper. It refers to analysis results by other researchers based on this dataset. |
| Hardware Specification | No | This paper is a theoretical survey focusing on fundamental limits and algorithms for community detection. It does not describe any experimental setup involving specific hardware to run empirical tests or generate results. |
| Software Dependencies | No | This paper is a theoretical survey and does not describe any specific software implementations or dependencies with version numbers used to conduct experiments. It mentions various algorithms but not the software used to run them for the paper's findings. |
| Experiment Setup | No | This paper is a theoretical survey. It describes algorithms and their theoretical properties but does not detail any experimental setups, hyperparameters, or training configurations for empirical evaluation within the paper itself. |