reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Chromatic Correlation Clustering, Revisited

Authors: Qing Xiu, Kai Han, Jing Tang, Shuang Cui, He Huang

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments using real-world datasets and the results strongly demonstrate the superiorities of Greedy Vote in terms of both clustering quality and efficiency, compared to the state-of-the-art algorithms with or without performance guarantees.In this section, we compare our Greedy Vote (GV) algorithm (i.e., Algorithm 2) with six state-of-the-art algorithms for the CCC problem, including: (1) The Pivot Algorithm [2]; (2) The Reduce and Cluster (RC) Algorithm [3]; (3) The Deep Cluster (DC) algorithm [3]; (4) The Chromatic Ball (CB) algorithm [6]; (5) The Greedy Expansion (GE) Algorithm [25]; and (6) The modified greedy expansion (GER) Algorithm [25].
Researcher Affiliation	Academia	Qing Xiu School of Computer Science and Technology Suzhou Institute for Advanced Research University of Science and Technology of China EMAIL Han School of Computer Science and Technology Soochow University EMAIL Tang The Hong Kong University of Science and Technology (Guangzhou) The Hong Kong University of Science and Technology EMAIL Cui School of Computer Science and Technology Suzhou Institute for Advanced Research University of Science and Technology of China EMAIL Huang School of Computer Science and Technology Soochow University EMAIL
Pseudocode	Yes	Algorithm 1: LP-Clustering and Algorithm 2: Greedy Vote
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplemental material
Open Datasets	Yes	We use ten real-world datasets from different domains in our experiments, as listed in Table 1. Among these datasets, Facebook1, Facebook2, Lastfm and Twitter are social network graphs downloaded from [28], where vertices represent users and edges represent friendships. Following the setting in Anava et al. [3], we assign each social circle a color and set the color of each edge uv as the color of u and v s common social circle (if there are multiple common circles then choose an arbitrary one). The DAWN dataset [1][25] models a drug abuse warning network... The Cooking dataset [23][25] models an ingredient co-occur network... String1 and String2 [12] are networks containing protein-protein interactions... Finally, DBLP and MAG [25] are co-authorship networks...
Dataset Splits	No	Section 6 describes the datasets used and some parameter settings for the algorithms, but it does not specify how the datasets were split into training, validation, or test sets.
Hardware Specification	Yes	All our experiments are conducted on an Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz with 128GB of RAM.
Software Dependencies	No	We implement the evaluated algorithms using Python and also use the public code of Klodt et al. [25] for implementation. No specific version numbers for Python or other software libraries are provided.
Experiment Setup	Yes	In our experiments, we set ϵ = 0.55 and m = 10 for the GV algorithm, and follow all the parameter settings (if any) of other algorithms according to their proposals.