Sparse-pivot: Dynamic correlation clustering for node insertions
Authors: Mina Dalirrooyfard, Konstantin Makarychev, Slobodan Mitrović
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically demonstrate that our approximation guarantee is better than that of REFERENCE CLUSTERING and the algorithm of (Cohen-Addad et al.). We evaluate the algorithms on two types of graphs. (1) Sparse real-world graphs from SNAP (Jure, 2014): a social network (musae-facebook), an email network (email Enron), a collaboration network (ca-Astro Ph), and a paper citation network (cit-Hep Th) (2) The drift dataset (Vergara et al., 2012; Rodriguez-Lujan et al., 2014) from ICO Machine Learning Repository (Dua et al., 2017), which includes 13,910 points embedded in a space of 129 dimensions. Baselines We use three baselines: making all nodes singletons, which we call SINGLETONS, DYNAMIC-AGREEMENT, and REFERENCE CLUSTERING. Results: Approximation Guarantee For all the datasets, our approximation guarantee is better than DYNAMICAGREEMENT and SINGLETONS. Figure 1 shows this objective for one of these graphs, and the rest can be found in Appendix F. The average clustering objective for the drift dataset graphs is shown in Table 1. |
| Researcher Affiliation | Collaboration | 1Machine Learning Research, Morgan Stanley, Canada. EMAIL. 2Northwestern University, Chicago, USA. EMAIL. 3University of California, Davis, USA. EMAIL. |
| Pseudocode | Yes | Algorithm 1 Insertion for REFERENCE CLUSTERING 1: input node u, graph G, ordering π, pivot array p. 2: Compute p(u) = arg minw N[u] π(w). 3: if p(u) = u: 4: Mark u as a pivot. 5: Create a new cluster with u in it. 6: Run EXPLORE(u, G, π, p) 7: else if p(u) is a pivot: 8: Put u in p(u) s cluster. 9: else: 10: Make u a singleton. |
| Open Source Code | No | No explicit statement or link for open-source code was found in the paper. |
| Open Datasets | Yes | We evaluate the algorithms on two types of graphs. (1) Sparse real-world graphs from SNAP (Jure, 2014): a social network (musae-facebook), an email network (email Enron), a collaboration network (ca-Astro Ph), and a paper citation network (cit-Hep Th) (2) The drift dataset (Vergara et al., 2012; Rodriguez-Lujan et al., 2014) from ICO Machine Learning Repository (Dua et al., 2017), which includes 13,910 points embedded in a space of 129 dimensions. |
| Dataset Splits | Yes | We choose a random ordering for the arrival of the nodes, and at each step, with probability 0.8, we insert the next node, and with probability 0.2, we delete a random node. If all the nodes have been inserted once, we delete them until no node is left. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) were found in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) were found in the paper. |
| Experiment Setup | Yes | We set the parameter ε for SPARSE-PIVOT to be 0.1. ... In BREAK-CLUSTER, for each node u Bv, we sample O(log n) nodes in Bv. If u is attached to half of them, we add u in Cv. In UPDATECLUSTER, we add u to Cv, even though u might not be attached to many nodes in Cv. After at least ε|Bv| nodes are added to Bv, we rerun BREAK-CLUSTER. ... Furthermore, we run RECOMPUTE every time the number of deletions reaches εN, instead of the total number of updates. |