reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Signed Laplacians for Constrained Graph Clustering

Authors: John Stewart Fabila Carrasco, He Sun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments to compare the spectral clustering method with the constrained clustering approach, using both synthetic and real-world datasets. The experimental results, visualised in Figure 1, illustrate the superior performance of CC++ compared to traditional SC, particularly as the inter-cluster edge probability q increases. Our experiments on synthetic and real-world datasets confirm the robustness and scalability of our algorithm, significantly outperforming standard spectral clustering under challenging scenarios.
Researcher Affiliation	Academia	1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom. Correspondence to: He Sun <EMAIL>.
Pseudocode	Yes	See Algorithm 1 for the formal description of our algorithm. Algorithm 1 The Constrained Clustering Algorithm
Open Source Code	No	The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We considered a binary Stochastic Block Model (SBM) with n = 1, 000 vertices divided into two equal-sized communities. Based on the Geometric Random Graphs (RGG) (Avrachenkov et al., 2021; Dall & Christensen, 2002), we generated two clusters of vertices... We evaluated SC and CC++ on real-world temperature data from ground stations in Brittany, January 2014 (Girault, 2015).
Dataset Splits	No	The paper describes the generation of synthetic datasets and the use of a real-world dataset but does not provide specific training/test/validation splits. For the synthetic datasets, it describes how the graphs were generated and varied (e.g., 'n = 1,000 vertices divided into two equal-sized communities'), and for the real-world data, it mentions using 'all available measurements, one per hour, across 24 hours over 31 days, totaling 744 observations', but no explicit splits for training, validation, or testing are detailed for any dataset.
Hardware Specification	Yes	All simulations were run on a PC equipped with an Intel Core i7-10610U CPU running at 1.80 GHz and 32 GB of RAM, using MATLAB R2024a for computation.
Software Dependencies	Yes	All simulations were run on a PC equipped with an Intel Core i7-10610U CPU running at 1.80 GHz and 32 GB of RAM, using MATLAB R2024a for computation.
Experiment Setup	Yes	Specifically, we fixed p = 0.2 and varied q from 0.12 to 0.2 in 30 equidistant steps. Each configuration of cluster separation was repeated 10 times, and we generated G and H on the same set of vertices but with different connectivity structures. Parameters were set to σ2 1 = 5 108 and σ2 = 105 (Girault, 2015).