DBSCAN: Optimal Rates For Density-Based Cluster Estimation
Authors: Daren Wang, Xinyang Lu, Alessandro Rinaldo
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of optimal estimation of the density cluster tree under various smoothness assumptions on the underlying density. ... we formulate a new notion of clustering consistency ... and derive minimax rates for cluster tree estimation under H older smooth densities of arbitrary degree. We present a computationally eļ¬cient, rate optimal cluster tree estimator based on simple extensions of the popular DBSCAN algorithm ... Our procedure relies on kernel density estimators and returns a sequence of nested random geometric graphs ... The resulting optimal rates for cluster tree estimation depend on the degree of smoothness ... Our results complement and extend the analysis of the DBSCAN algorithm ... Finally, we consider level set estimation and cluster consistency for densities with jump discontinuities. We demonstrate that the DBSCAN algorithm attains the minimax rate in terms of the jump size and sample size in this setting as well. |
| Researcher Affiliation | Academia | Daren Wang EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA; Xinyang Lu EMAIL Mathematical Sciences Department Lakehead University Thunder Bay, ON P7B 5E1, Canada; Alessandro Rinaldo EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA |
| Pseudocode | Yes | Algorithm 1 The DBSCAN algorithm. INPUT: i.i.d sample {Xi}n i=1, and h > 0. 1. For each k N, construct a graph Gh,k with nodes {Xi : |B(Xi, h) {Xj}n j=1| k} and edges (Xi, Xj) if Xi Xj < 2h. 2. Compute C(h, k), the graphical connected components of Gh,k. OUTPUT: {C(h, k), k N}. |
| Open Source Code | No | The paper does not contain any explicit statements about code release, nor does it provide links to code repositories. The license information provided relates to the paper itself, not to any accompanying code. |
| Open Datasets | No | The paper focuses on theoretical analysis of the DBSCAN algorithm and density-based clustering. It uses a generic 'i.i.d. sample {Xi}n i=1 of points with common distribution P' as input for its algorithms and theoretical derivations, but does not use or reference any specific public or open datasets for empirical evaluation. |
| Dataset Splits | No | The paper is a theoretical work focusing on minimax rates and consistency of clustering algorithms. It does not conduct empirical experiments or use specific datasets, therefore, there is no mention of dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical, focusing on mathematical analysis and algorithm properties, not on experimental performance. Therefore, it does not mention any specific hardware used for computations or experiments. |
| Software Dependencies | No | The paper is a theoretical study of the DBSCAN algorithm. It describes algorithmic procedures and their mathematical properties but does not detail any specific software implementations or dependencies with version numbers for reproducing empirical results. |
| Experiment Setup | No | The paper is a theoretical work on density-based clustering, deriving minimax rates and consistency for the DBSCAN algorithm. It does not present empirical experiments, and thus, there are no details regarding hyperparameter values, training configurations, or other system-level settings for an experimental setup. |