Local algorithms for interactive clustering

Authors: Pranjal Awasthi, Maria Florina Balcan, Konstantin Voevodski

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also show that our framework works well in practice. 5. Experimental Results We perform two sets of experiments. We first test our proposed split algorithm on the clustering of business listings maintained by Google. We then test the proposed model in its entirety on the 20 Newsgroups data set.
Researcher Affiliation Collaboration Pranjal Awasthi EMAIL Department of Computer Science Rutgers University Maria Florina Balcan EMAIL School of Computer Science Carnegie Mellon University Konstantin Voevodski EMAIL Google, NY, USA
Pseudocode Yes Figure 1: Local interactive clustering model Figure 2: Split procedure Algorithm: SPLIT PROCEDURE Figure 3: Merge procedure Algorithm: MERGE PROCEDURE Figure 4: Merge procedure for the correlation-clustering objective Algorithm: MERGE PROCEDURE Figure 5: Split procedure under stronger assumptions Algorithm: SPLIT PROCEDURE Figure 6: Merge procedure under strict separation Algorithm: MERGE PROCEDURE Figure 7: Merge procedure under strict threshold separation Algorithm: MERGE PROCEDURE Figure 8: Merge procedure for the unrestricted-merge model Algorithm: MERGE PROCEDURE
Open Source Code No No explicit statement about providing source code for the methodology described in this paper was found. Footnote 2 mentions the availability of "anonymized Google business listings data sets" but not the code.
Open Datasets Yes The anonymized Google business listings data sets are available here.2 We also test our entire interactive clustering framework on the 20 Newsgroups data set.3 (Footnote 3: http://people.csail.mit.edu/jrennie/20Newsgroups/)
Dataset Splits No The paper describes generating initial clusterings by perturbing the ground-truth and performing experiments with original and pruned data sets, but it does not specify explicit train/test/validation splits for the evaluation of the clustering algorithms.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments. No specific GPU, CPU, or other hardware details are provided in the experimental sections.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers. While it mentions techniques like 'term frequency inverse document frequency (tf-idf) vector' and 'cosine similarity', it does not specify the software libraries or their versions used for implementation.
Experiment Setup No The paper describes how initial clusterings are generated and how data sets are pruned, and mentions that 'cosine similarity' is used. However, it does not provide specific hyperparameters for the clustering algorithms (e.g., learning rates, batch sizes, number of epochs for iterative methods) or other explicit configuration details typically found in an experimental setup section.