Statistical guarantees for local graph clustering
Authors: Wooseok Ha, Kimon Fountoulakis, Michael W. Mahoney
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we consider the so-called ℓ1-regularized Page Rank algorithm (Fountoulakis et al., 2019), a popular algorithm for the local graph clustering problem, and we establish statistical recoverability guarantees for it... Empirically we demonstrate the ability of the method to recover the target cluster in a range of real-world data graphs. Finally, we establish an equivalence between ℓ1-regularized Page Rank and the very popular local graph clustering algorithm from Andersen et al. (2006); Zhu et al. (2013). This allows us to prove similar average case guarantees for the algorithm of Andersen et al. (2006); Zhu et al. (2013) as well. |
| Researcher Affiliation | Academia | Wooseok Ha EMAIL Department of Statistics, University of California at Berkeley, Berkeley, CA, USA. Kimon Fountoulakis EMAIL School of Computer Science, University of Waterloo, Waterloo, ON, Canada. Michael W. Mahoney EMAIL ICSI and Department of Statistics, University of California at Berkeley, Berkeley, CA, USA. |
| Pseudocode | Yes | Algorithm 1: A modified BFS algorithm for seed set expansion |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Sfld. This dataset contains pairwise similarities of blasted sequences of 232 proteins belonging to the amidohydrolase superfamily (Brown et al., 2006)... PPI-mips. This dataset is a protein-protein interaction graph of mammalian species (Pagel et al., 2004)... FB-Johns55. This graph is a Facebook anonymized dataset... from Traud et al. (2011, 2012)... Orkut. Orkut is a free on-line social network... It can be downloaded from Leskovec and Krevl (2014). |
| Dataset Splits | No | The paper describes generating synthetic data and using existing 'ground truth clusters' for real datasets, but it does not specify explicit training/test/validation splits or percentages for these datasets to reproduce experiments. |
| Hardware Specification | No | The paper includes a 'Memory scalability' section and tables with running times, but it does not specify any particular CPU, GPU, or other hardware used for the experiments. |
| Software Dependencies | No | The paper mentions using a 'proximal coordinate descent algorithm' and refers to 'Dinic’s algorithm' but does not specify any software names with version numbers for implementation or execution of the methods described. |
| Experiment Setup | Yes | For the experiments in Figure 2, we fix the teleportation parameter α = 0.1. We generate graphs from the stochastic block model which consists of 10 clusters, each of which has 20 nodes and only one of which is the target cluster K. We use the same parameters p and q across different clusters to generate edges within and between clusters. Here we set p = 0.5 and q is varying in order to generate various γ as is shown in Figure 2. The results are averaged over 30 trials... APPR and ℓ1-reg. PR have two parameters, the teleportation parameter α and a tolerance/ℓ1-regularization parameter ρ. ...For each parameter setting we use sweep cut to round the output of APPR and ℓ1-reg. PR to find a cluster of low conductance. |