reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Sparsification of Bipartite-Like Clusters in Graphs

Authors: Joyentanuj Das, Suranjan De, He Sun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experimental studies on both synthetic and real-world datasets, and show that our algorithms significantly speedup the running time of existing clustering algorithms while preserving their effectiveness.
Researcher Affiliation	Academia	1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom. Correspondence to: He Sun <EMAIL>.
Pseudocode	No	The paper describes the algorithms in text, including 'Our algorithm is similar with (Sun & Zanetti, 2019) at a high level' and details of the sampling process, but does not include a dedicated pseudocode block or a clearly labeled algorithm with structured steps.
Open Source Code	Yes	Our code can be downloaded from https://github.com/suranjande4/Online-Sparsification-of-Bipartite-Like-Clusters-in-Graphs.
Open Datasets	Yes	We evaluate the performance of our algorithm on the Dyadic Militarised Interstate Disputes Dataset (v3.1) (Maoz et al., 2019), which records every interstate dispute during 1900 1950... Now we evaluate the algorithms performance on the US Migration Dataset (U.S. Census Bureau, 2000).
Dataset Splits	No	The paper describes synthetic data generation parameters (e.g., 'graph has k = 2 clusters', 'n1 = n2', 'q = 0.1p') and mentions using real-world datasets with a local algorithm (e.g., 'a random vertex from C1 C2 is chosen as the starting vertex'), but does not specify explicit training/test/validation dataset splits in terms of percentages or counts needed for reproduction in a machine learning context.
Hardware Specification	Yes	All experiments were performed on a HP ZBook Studio with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz processor and 32 GB of RAM.
Software Dependencies	No	The paper provides a link to its code repository but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experimental environment.
Experiment Setup	Yes	Specifically, we assume that the graph has k = 2 clusters, say C1, C2, and the number of vertices in each cluster, denoted by n1 and n2 respectively, satisfies n1 = n2. Moreover, any pair of vertices u Ci and v Cj is connected with probability pij. We assume that p12 = p21 = p and p11 = p22 = q, where q = 0.1p. Throughout the experiments, we leave the parameters n and p free but maintain the above relations... We set γ = 0.02 for the MS algorithm... We fix the value of η = 0.7 and increase the number of vertices in each partition from 500 to 2, 500...