Query-Efficient Correlation Clustering with Noisy Oracle
Authors: Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi, Wei Chen
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical findings by demonstrating that KC-FC and KC-FB outperform baseline methods in terms of the sample complexity and cost of clustering, respectively. |
| Researcher Affiliation | Collaboration | Yuko Kuroki CENTAI Institute Turin, Italy EMAIL Atsushi Miyauchi CENTAI Institute Turin, Italy EMAIL Francesco Bonchi CENTAI Institute, Turin, Italy Eurecat, Barcelona, Spain EMAIL Wei Chen Microsoft Research Beijing, China EMAIL |
| Pseudocode | Yes | Algorithm 1 Kwik Cluster with Fixed Confidence (KC-FC), Algorithm 2 Threshold Bandits for indentifying High Similarity pairs with ϵ (0, 0.5) (TB-HS), Algorithm 3 Kwik Cluster with Fixed Budget (KC-FB), Algorithm 4 Kwik Cluster(V, s), Algorithm 5 KC-FC variant with sequential use of TB-HS, Algorithm 6 Uniform sampling in the FC setting (Uniform-FC), Algorithm 7 Uniform sampling in the FB setting (Uniform-FB) |
| Open Source Code | Yes | The code was written in Python 3, which is available online.3 |
| Open Datasets | Yes | Datasets. We use publicly-available realworld graphs presented in Table 1. |
| Dataset Splits | No | The paper describes how instances were generated (e.g., varying LB_min, embedding with node2vec) but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 percentages or specific sample counts) or cross-validation setup. |
| Hardware Specification | Yes | The experiments were performed on a machine with Apple M1 Chip and 16 GB RAM. |
| Software Dependencies | No | The paper mentions 'Python 3' and 'node2vec' but does not provide specific version numbers for these software components to ensure reproducibility (e.g., 'Python 3.x', 'node2vec vX.Y.Z'). |
| Experiment Setup | Yes | In both KC-FC and Uniform-FC, we set ϵ = n allowing each element to make only 1/ n mistakes, and δ = 0.01 following a standard choice in PE-MAB. For each graph, we vary the lower bound on min, which we denote by LB min, in {0.10, 0.15, 0.20, . . . , 0.50}. For large instances with n 1,000, we fix T = n2.2 for scalability. |