reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Regularized spectral methods for clustering signed networks

Authors: Mihai Cucuringu, Apoorv Vikram Singh, Déborah Sulem, Hemant Tyagi

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical results with an extensive set of numerical experiments on synthetic data, and three real world data sets standard in the signed networks literature.
Researcher Affiliation	Academia	Mihai Cucuringu EMAIL Department of Statistics and Mathematical Institute University of Oxford The Alan Turing Institute, London, UK Apoorv Vikram Singh EMAIL Department of Computer Science and Engineering New York University D eborah Sulem EMAIL Department of Statistics University of Oxford Hemant Tyagi EMAIL UMR 8524 Laboratoire Paul Painlev e, F-59000 Inria, Univ. Lille, CNRS
Pseudocode	No	The paper describes algorithms and procedures textually and mathematically throughout Section 2 and 6, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Python implementations of a suite of algorithms for signed clustering are available at https://github. com/alan-turing-institute/signet
Open Datasets	Yes	Finally, we measure the performances of our unregularized and regularized algorithms on three benchmark data sets in the signed clustering problem: the Wikipedia Requests for Adminship, the Slashbot Zoo and Bitcoin data sets from Leskovec and Krevl (2014).
Dataset Splits	No	The paper describes the generation of synthetic data and the number of runs for experiments but does not provide explicit training/test/validation splits for either synthetic or real-world datasets. For instance, in Section 6.2, it states: "1. Fix s 1 = 1/k, pick a value for ρ and compute s k = s 1/ρ. 2. For i [2, k 1], sample s i from the uniform distribution in the interval [s 1, s k]. 3. Compute the proportions si = s i Pk i=1 s i , and then sample the graph from the resulting proportions. 4. Repeat 20 times the steps 1-3 mentioned above, and record the average performance over the 20 runs."
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running its experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies	No	The paper mentions using "Python implementation", the "scikit-learn package", and the "signet package" but does not specify any version numbers for these software components. For example, it states: "using the implementation in scikit-learn of the k-means++ algorithm" and "with the implementation in the signet package (Cucuringu et al., 2019)".
Experiment Setup	Yes	In the following experiments, the Signed Stochastic Block Model will be sampled with the following set of parameters: the number of nodes n = 5000, the number of communities k {3, 5, 10, 20}, the relative size of communities ρ = 1 (equal-size clusters) and ρ = 1/k (non-equal size clusters). For the edge density parameter p, we choose two sparsity regimes, Regime I and Regime II... The noise level η is chosen such that the recovery of the clusters is unsatisfactory for a subset of pairs of parameters (τ +, τ ). For each set of parameters, we sample 20 graphs from the SSBM and average the resulting ARI. For the combinatorial and symmetric Signed Laplacians L and Lsym, we compute (k 1)-dimensional embeddings before applying the k-means++ algorithm. For all other methods, we use the k smallest eigenvectors.