Regularized spectral methods for clustering signed networks

Authors: Mihai Cucuringu, Apoorv Vikram Singh, Déborah Sulem, Hemant Tyagi

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with an extensive set of numerical experiments on synthetic data, and three real world data sets standard in the signed networks literature.
Researcher Affiliation Academia Mihai Cucuringu EMAIL Department of Statistics and Mathematical Institute University of Oxford The Alan Turing Institute, London, UK Apoorv Vikram Singh EMAIL Department of Computer Science and Engineering New York University D eborah Sulem EMAIL Department of Statistics University of Oxford Hemant Tyagi EMAIL UMR 8524 Laboratoire Paul Painlev e, F-59000 Inria, Univ. Lille, CNRS
Pseudocode No The paper describes algorithms and procedures textually and mathematically throughout Section 2 and 6, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Python implementations of a suite of algorithms for signed clustering are available at https://github. com/alan-turing-institute/signet
Open Datasets Yes Finally, we measure the performances of our unregularized and regularized algorithms on three benchmark data sets in the signed clustering problem: the Wikipedia Requests for Adminship, the Slashbot Zoo and Bitcoin data sets from Leskovec and Krevl (2014).
Dataset Splits No The paper describes the generation of synthetic data and the number of runs for experiments but does not provide explicit training/test/validation splits for either synthetic or real-world datasets. For instance, in Section 6.2, it states: "1. Fix s 1 = 1/k, pick a value for ρ and compute s k = s 1/ρ. 2. For i [2, k 1], sample s i from the uniform distribution in the interval [s 1, s k]. 3. Compute the proportions si = s i Pk i=1 s i , and then sample the graph from the resulting proportions. 4. Repeat 20 times the steps 1-3 mentioned above, and record the average performance over the 20 runs."
Hardware Specification No The paper does not explicitly mention any specific hardware used for running its experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies No The paper mentions using "Python implementation", the "scikit-learn package", and the "signet package" but does not specify any version numbers for these software components. For example, it states: "using the implementation in scikit-learn of the k-means++ algorithm" and "with the implementation in the signet package (Cucuringu et al., 2019)".
Experiment Setup Yes In the following experiments, the Signed Stochastic Block Model will be sampled with the following set of parameters: the number of nodes n = 5000, the number of communities k {3, 5, 10, 20}, the relative size of communities ρ = 1 (equal-size clusters) and ρ = 1/k (non-equal size clusters). For the edge density parameter p, we choose two sparsity regimes, Regime I and Regime II... The noise level η is chosen such that the recovery of the clusters is unsatisfactory for a subset of pairs of parameters (τ +, τ ). For each set of parameters, we sample 20 graphs from the SSBM and average the resulting ARI. For the combinatorial and symmetric Signed Laplacians L and Lsym, we compute (k 1)-dimensional embeddings before applying the k-means++ algorithm. For all other methods, we use the k smallest eigenvectors.