reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical guarantees for local graph clustering

Authors: Wooseok Ha, Kimon Fountoulakis, Michael W. Mahoney

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we consider the so-called ℓ1-regularized Page Rank algorithm (Fountoulakis et al., 2019), a popular algorithm for the local graph clustering problem, and we establish statistical recoverability guarantees for it... Empirically we demonstrate the ability of the method to recover the target cluster in a range of real-world data graphs. Finally, we establish an equivalence between ℓ1-regularized Page Rank and the very popular local graph clustering algorithm from Andersen et al. (2006); Zhu et al. (2013). This allows us to prove similar average case guarantees for the algorithm of Andersen et al. (2006); Zhu et al. (2013) as well.
Researcher Affiliation	Academia	Wooseok Ha EMAIL Department of Statistics, University of California at Berkeley, Berkeley, CA, USA. Kimon Fountoulakis EMAIL School of Computer Science, University of Waterloo, Waterloo, ON, Canada. Michael W. Mahoney EMAIL ICSI and Department of Statistics, University of California at Berkeley, Berkeley, CA, USA.
Pseudocode	Yes	Algorithm 1: A modiﬁed BFS algorithm for seed set expansion
Open Source Code	No	The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets	Yes	Sﬂd. This dataset contains pairwise similarities of blasted sequences of 232 proteins belonging to the amidohydrolase superfamily (Brown et al., 2006)... PPI-mips. This dataset is a protein-protein interaction graph of mammalian species (Pagel et al., 2004)... FB-Johns55. This graph is a Facebook anonymized dataset... from Traud et al. (2011, 2012)... Orkut. Orkut is a free on-line social network... It can be downloaded from Leskovec and Krevl (2014).
Dataset Splits	No	The paper describes generating synthetic data and using existing 'ground truth clusters' for real datasets, but it does not specify explicit training/test/validation splits or percentages for these datasets to reproduce experiments.
Hardware Specification	No	The paper includes a 'Memory scalability' section and tables with running times, but it does not specify any particular CPU, GPU, or other hardware used for the experiments.
Software Dependencies	No	The paper mentions using a 'proximal coordinate descent algorithm' and refers to 'Dinic’s algorithm' but does not specify any software names with version numbers for implementation or execution of the methods described.
Experiment Setup	Yes	For the experiments in Figure 2, we ﬁx the teleportation parameter α = 0.1. We generate graphs from the stochastic block model which consists of 10 clusters, each of which has 20 nodes and only one of which is the target cluster K. We use the same parameters p and q across different clusters to generate edges within and between clusters. Here we set p = 0.5 and q is varying in order to generate various γ as is shown in Figure 2. The results are averaged over 30 trials... APPR and ℓ1-reg. PR have two parameters, the teleportation parameter α and a tolerance/ℓ1-regularization parameter ρ. ...For each parameter setting we use sweep cut to round the output of APPR and ℓ1-reg. PR to ﬁnd a cluster of low conductance.