reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse-pivot: Dynamic correlation clustering for node insertions

Authors: Mina Dalirrooyfard, Konstantin Makarychev, Slobodan Mitrović

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically demonstrate that our approximation guarantee is better than that of REFERENCE CLUSTERING and the algorithm of (Cohen-Addad et al.). We evaluate the algorithms on two types of graphs. (1) Sparse real-world graphs from SNAP (Jure, 2014): a social network (musae-facebook), an email network (email Enron), a collaboration network (ca-Astro Ph), and a paper citation network (cit-Hep Th) (2) The drift dataset (Vergara et al., 2012; Rodriguez-Lujan et al., 2014) from ICO Machine Learning Repository (Dua et al., 2017), which includes 13,910 points embedded in a space of 129 dimensions. Baselines We use three baselines: making all nodes singletons, which we call SINGLETONS, DYNAMIC-AGREEMENT, and REFERENCE CLUSTERING. Results: Approximation Guarantee For all the datasets, our approximation guarantee is better than DYNAMICAGREEMENT and SINGLETONS. Figure 1 shows this objective for one of these graphs, and the rest can be found in Appendix F. The average clustering objective for the drift dataset graphs is shown in Table 1.
Researcher Affiliation	Collaboration	1Machine Learning Research, Morgan Stanley, Canada. EMAIL. 2Northwestern University, Chicago, USA. EMAIL. 3University of California, Davis, USA. EMAIL.
Pseudocode	Yes	Algorithm 1 Insertion for REFERENCE CLUSTERING 1: input node u, graph G, ordering π, pivot array p. 2: Compute p(u) = arg minw N[u] π(w). 3: if p(u) = u: 4: Mark u as a pivot. 5: Create a new cluster with u in it. 6: Run EXPLORE(u, G, π, p) 7: else if p(u) is a pivot: 8: Put u in p(u) s cluster. 9: else: 10: Make u a singleton.
Open Source Code	No	No explicit statement or link for open-source code was found in the paper.
Open Datasets	Yes	We evaluate the algorithms on two types of graphs. (1) Sparse real-world graphs from SNAP (Jure, 2014): a social network (musae-facebook), an email network (email Enron), a collaboration network (ca-Astro Ph), and a paper citation network (cit-Hep Th) (2) The drift dataset (Vergara et al., 2012; Rodriguez-Lujan et al., 2014) from ICO Machine Learning Repository (Dua et al., 2017), which includes 13,910 points embedded in a space of 129 dimensions.
Dataset Splits	Yes	We choose a random ordering for the arrival of the nodes, and at each step, with probability 0.8, we insert the next node, and with probability 0.2, we delete a random node. If all the nodes have been inserted once, we delete them until no node is left.
Hardware Specification	No	No specific hardware details (GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) were found in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) were found in the paper.
Experiment Setup	Yes	We set the parameter ε for SPARSE-PIVOT to be 0.1. ... In BREAK-CLUSTER, for each node u Bv, we sample O(log n) nodes in Bv. If u is attached to half of them, we add u in Cv. In UPDATECLUSTER, we add u to Cv, even though u might not be attached to many nodes in Cv. After at least ε\|Bv\| nodes are added to Bv, we rerun BREAK-CLUSTER. ... Furthermore, we run RECOMPUTE every time the number of deletions reaches εN, instead of the total number of updates.