reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

Authors: Xiaoyu Qin, Kai Ming Ting, Ye Zhu, Vincent CS Lee4755-4762

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The impact of Isolation Similarity on density-based clustering is studied here. We show for the ﬁrst time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be signiﬁcantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally deﬁned. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.
Researcher Affiliation	Academia	Xiaoyu Qin Monash University Victoria, Australia 3800 EMAIL Kai Ming Ting Federation University Victoria, Australia 3842 EMAIL Ye Zhu Deakin University Victoria, Australia 3125 EMAIL Vincent CS Lee Monash University Victoria, Australia 3800 EMAIL
Pseudocode	No	N/A
Open Source Code	Yes	All algorithms used in our experiments are implemented in Matlab (the source code with demo can be obtained from https://github.com/cswords/anne-dbscan-demo).
Open Datasets	Yes	The artiﬁcial datasets are from http://cs.uef.ﬁ/sipu/datasets/ (Gionis, Mannila, and Tsaparas 2007; Zahn 1971; Chang and Yeung 2008; Jain and Law 2005) except that the hard distribution dataset is from https://sourceforge.net/p/density-ratio/ (Zhu, Ting, and Carman 2016), 5 high-dimensional data are from http: //featureselection.asu.edu/datasets.php (Li et al. 2016), and the rest of the datasets are from http://archive.ics.uci.edu/ml (Dheeru and Karra Taniskidou 2017).
Dataset Splits	No	We compared all clustering results in terms of the best F1 score (Rijsbergen 1979) that is obtained from a search of the algorithm’s parameter. We search each parameter within a reasonable range.
Hardware Specification	Yes	The experiments ran on a machine having CPU: i5-8600k 4.30GHz processor, 8GB RAM; and GPU: GTX Titan X with 3072 1075MHz CUDA (Owens et al. 2008) cores & 12GB graphic memory.
Software Dependencies	No	All algorithms used in our experiments are implemented in Matlab (the source code with demo can be obtained from https://github.com/cswords/anne-dbscan-demo). We produced the GPU accelerated versions of all implementations.
Experiment Setup	Yes	The ranges used for all algorithms/dissimilarities are provided in Table 2.