Online Sparsification of Bipartite-Like Clusters in Graphs
Authors: Joyentanuj Das, Suranjan De, He Sun
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experimental studies on both synthetic and real-world datasets, and show that our algorithms significantly speedup the running time of existing clustering algorithms while preserving their effectiveness. |
| Researcher Affiliation | Academia | 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom. Correspondence to: He Sun <EMAIL>. |
| Pseudocode | No | The paper describes the algorithms in text, including 'Our algorithm is similar with (Sun & Zanetti, 2019) at a high level' and details of the sampling process, but does not include a dedicated pseudocode block or a clearly labeled algorithm with structured steps. |
| Open Source Code | Yes | Our code can be downloaded from https://github.com/suranjande4/Online-Sparsification-of-Bipartite-Like-Clusters-in-Graphs. |
| Open Datasets | Yes | We evaluate the performance of our algorithm on the Dyadic Militarised Interstate Disputes Dataset (v3.1) (Maoz et al., 2019), which records every interstate dispute during 1900 1950... Now we evaluate the algorithms performance on the US Migration Dataset (U.S. Census Bureau, 2000). |
| Dataset Splits | No | The paper describes synthetic data generation parameters (e.g., 'graph has k = 2 clusters', 'n1 = n2', 'q = 0.1p') and mentions using real-world datasets with a local algorithm (e.g., 'a random vertex from C1 C2 is chosen as the starting vertex'), but does not specify explicit training/test/validation dataset splits in terms of percentages or counts needed for reproduction in a machine learning context. |
| Hardware Specification | Yes | All experiments were performed on a HP ZBook Studio with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz processor and 32 GB of RAM. |
| Software Dependencies | No | The paper provides a link to its code repository but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experimental environment. |
| Experiment Setup | Yes | Specifically, we assume that the graph has k = 2 clusters, say C1, C2, and the number of vertices in each cluster, denoted by n1 and n2 respectively, satisfies n1 = n2. Moreover, any pair of vertices u Ci and v Cj is connected with probability pij. We assume that p12 = p21 = p and p11 = p22 = q, where q = 0.1p. Throughout the experiments, we leave the parameters n and p free but maintain the above relations... We set γ = 0.02 for the MS algorithm... We fix the value of η = 0.7 and increase the number of vertices in each partition from 500 to 2, 500... |