Correlation Clustering Beyond the Pivot Algorithm
Authors: Soheil Behnezhad, Moses Charikar, Vincent Cohen-Addad, Alma Ghafari, Weiyun Ma
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the output of MODIFIEDPIVOT on average makes less than 77% of the mistakes made by PIVOT. More surprisingly, we prove theoretically that MODIFIEDPIVOT has approximation ratio 3 ε0 for some absolute constant ε0 > 0. This, e.g., leads to a better than 3 approximation in the dynamic setting in polylog time, improving the 3-approximation obtained by Behnezhad et al. (FOCS 19). [...] We also implement MODIFIEDPIVOT and compare its output with PIVOT on various publicly available data sets. Our empirical data suggests that MODIFIEDPIVOT makes less than 77% of the mistakes made by PIVOT on average. See Section 6. [...] In this set of experiments, we compare the objective values of PIVOT and MODIFIEDPIVOT algorithms, ensuring that both utilize the same randomness for the ordering of pivots. |
| Researcher Affiliation | Collaboration | 1 Northeastern University 2 Stanford University 3Google Research. |
| Pseudocode | Yes | In this paper, we present MODIFIEDPIVOT, an algorithm that locally improves the output of PIVOT. Our MODIFIEDPIVOT algorithm can be implemented just as efficiently as PIVOT in various settings. [...] Our MODIFIEDPIVOT algorithm is formalized below as Algorithm 1. [...] Algorithm 1 The MODIFIEDPIVOT algorithm. [...] Algorithm 2 Our charging scheme for MODIFIEDPIVOT. This algorithm is only used for the analysis of Algorithm 1. |
| Open Source Code | No | The paper discusses the implementation of MODIFIEDPIVOT and compares its performance through experiments, but it does not provide any explicit statement about making its source code publicly available, nor does it include a link to a code repository. |
| Open Datasets | Yes | We also implement MODIFIEDPIVOT and compare its output with PIVOT on various publicly available data sets. Our empirical data suggests that MODIFIEDPIVOT makes less than 77% of the mistakes made by PIVOT on average. See Section 6. [...] The first set of data consists of selected graphs, which represent real-life applications in (Davis & Hu, 2011). Our dataset is a selection that covers the following categories: communication networks (email), collaboration networks (Erdos991, Netscience), citation networks(Sma Gri), biological networks (celegans-neural, celegans-metabolic) and others (Harvard500, polblogs). Furthermore, this dataset has been featured in other studies on correlation clustering (Veldt, 2022; Veldt et al., 2018). |
| Dataset Splits | No | The paper describes generating data for Stochastic Block Models with parameters (e.g., pin = 0.1, pout = 0.9) and using 'multiple permutations of vertex orderings' for real-world datasets, which simulates different pivot sequences for evaluation, but it does not specify traditional training, validation, or test dataset splits. |
| Hardware Specification | No | The paper describes empirical results and experiments but does not provide any specific details about the hardware used to run these experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper mentions implementing algorithms and conducting experiments, but it does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | For these experiments, we generate multiple permutations of vertex orderings to simulate different sequences in which pivots are selected. We emphasize that multiple random seeds per experiment have been used. For each permutation, we select values in the range [0,3, 0.8] for the parameters ε and δ. For each combination of ε and δ, we compute the clustering produced by MODIFIEDPIVOT and calculate its disagreements. We tested at most 8 choices for each of epsilon and delta in all the runs, which adds up to at most 64 combinations. [...] In our experiments, we set pin = 0.1 and pout = 0.9. |