reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Local algorithms for interactive clustering

Authors: Pranjal Awasthi, Maria Florina Balcan, Konstantin Voevodski

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also show that our framework works well in practice. 5. Experimental Results We perform two sets of experiments. We ﬁrst test our proposed split algorithm on the clustering of business listings maintained by Google. We then test the proposed model in its entirety on the 20 Newsgroups data set.
Researcher Affiliation	Collaboration	Pranjal Awasthi EMAIL Department of Computer Science Rutgers University Maria Florina Balcan EMAIL School of Computer Science Carnegie Mellon University Konstantin Voevodski EMAIL Google, NY, USA
Pseudocode	Yes	Figure 1: Local interactive clustering model Figure 2: Split procedure Algorithm: SPLIT PROCEDURE Figure 3: Merge procedure Algorithm: MERGE PROCEDURE Figure 4: Merge procedure for the correlation-clustering objective Algorithm: MERGE PROCEDURE Figure 5: Split procedure under stronger assumptions Algorithm: SPLIT PROCEDURE Figure 6: Merge procedure under strict separation Algorithm: MERGE PROCEDURE Figure 7: Merge procedure under strict threshold separation Algorithm: MERGE PROCEDURE Figure 8: Merge procedure for the unrestricted-merge model Algorithm: MERGE PROCEDURE
Open Source Code	No	No explicit statement about providing source code for the methodology described in this paper was found. Footnote 2 mentions the availability of "anonymized Google business listings data sets" but not the code.
Open Datasets	Yes	The anonymized Google business listings data sets are available here.2 We also test our entire interactive clustering framework on the 20 Newsgroups data set.3 (Footnote 3: http://people.csail.mit.edu/jrennie/20Newsgroups/)
Dataset Splits	No	The paper describes generating initial clusterings by perturbing the ground-truth and performing experiments with original and pruned data sets, but it does not specify explicit train/test/validation splits for the evaluation of the clustering algorithms.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. No specific GPU, CPU, or other hardware details are provided in the experimental sections.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. While it mentions techniques like 'term frequency inverse document frequency (tf-idf) vector' and 'cosine similarity', it does not specify the software libraries or their versions used for implementation.
Experiment Setup	No	The paper describes how initial clusterings are generated and how data sets are pruned, and mentions that 'cosine similarity' is used. However, it does not provide specific hyperparameters for the clustering algorithms (e.g., learning rates, batch sizes, number of epochs for iterative methods) or other explicit configuration details typically found in an experimental setup section.