Statistical guarantees for local graph clustering

Authors: Wooseok Ha, Kimon Fountoulakis, Michael W. Mahoney

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we consider the so-called ℓ1-regularized Page Rank algorithm (Fountoulakis et al., 2019), a popular algorithm for the local graph clustering problem, and we establish statistical recoverability guarantees for it... Empirically we demonstrate the ability of the method to recover the target cluster in a range of real-world data graphs. Finally, we establish an equivalence between ℓ1-regularized Page Rank and the very popular local graph clustering algorithm from Andersen et al. (2006); Zhu et al. (2013). This allows us to prove similar average case guarantees for the algorithm of Andersen et al. (2006); Zhu et al. (2013) as well.
Researcher Affiliation Academia Wooseok Ha EMAIL Department of Statistics, University of California at Berkeley, Berkeley, CA, USA. Kimon Fountoulakis EMAIL School of Computer Science, University of Waterloo, Waterloo, ON, Canada. Michael W. Mahoney EMAIL ICSI and Department of Statistics, University of California at Berkeley, Berkeley, CA, USA.
Pseudocode Yes Algorithm 1: A modified BFS algorithm for seed set expansion
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets Yes Sfld. This dataset contains pairwise similarities of blasted sequences of 232 proteins belonging to the amidohydrolase superfamily (Brown et al., 2006)... PPI-mips. This dataset is a protein-protein interaction graph of mammalian species (Pagel et al., 2004)... FB-Johns55. This graph is a Facebook anonymized dataset... from Traud et al. (2011, 2012)... Orkut. Orkut is a free on-line social network... It can be downloaded from Leskovec and Krevl (2014).
Dataset Splits No The paper describes generating synthetic data and using existing 'ground truth clusters' for real datasets, but it does not specify explicit training/test/validation splits or percentages for these datasets to reproduce experiments.
Hardware Specification No The paper includes a 'Memory scalability' section and tables with running times, but it does not specify any particular CPU, GPU, or other hardware used for the experiments.
Software Dependencies No The paper mentions using a 'proximal coordinate descent algorithm' and refers to 'Dinic’s algorithm' but does not specify any software names with version numbers for implementation or execution of the methods described.
Experiment Setup Yes For the experiments in Figure 2, we fix the teleportation parameter α = 0.1. We generate graphs from the stochastic block model which consists of 10 clusters, each of which has 20 nodes and only one of which is the target cluster K. We use the same parameters p and q across different clusters to generate edges within and between clusters. Here we set p = 0.5 and q is varying in order to generate various γ as is shown in Figure 2. The results are averaged over 30 trials... APPR and ℓ1-reg. PR have two parameters, the teleportation parameter α and a tolerance/ℓ1-regularization parameter ρ. ...For each parameter setting we use sweep cut to round the output of APPR and ℓ1-reg. PR to find a cluster of low conductance.