reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DBSCAN: Optimal Rates For Density-Based Cluster Estimation

Authors: Daren Wang, Xinyang Lu, Alessandro Rinaldo

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the problem of optimal estimation of the density cluster tree under various smoothness assumptions on the underlying density. ... we formulate a new notion of clustering consistency ... and derive minimax rates for cluster tree estimation under H older smooth densities of arbitrary degree. We present a computationally eﬃcient, rate optimal cluster tree estimator based on simple extensions of the popular DBSCAN algorithm ... Our procedure relies on kernel density estimators and returns a sequence of nested random geometric graphs ... The resulting optimal rates for cluster tree estimation depend on the degree of smoothness ... Our results complement and extend the analysis of the DBSCAN algorithm ... Finally, we consider level set estimation and cluster consistency for densities with jump discontinuities. We demonstrate that the DBSCAN algorithm attains the minimax rate in terms of the jump size and sample size in this setting as well.
Researcher Affiliation	Academia	Daren Wang EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA; Xinyang Lu EMAIL Mathematical Sciences Department Lakehead University Thunder Bay, ON P7B 5E1, Canada; Alessandro Rinaldo EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213, USA
Pseudocode	Yes	Algorithm 1 The DBSCAN algorithm. INPUT: i.i.d sample {Xi}n i=1, and h > 0. 1. For each k N, construct a graph Gh,k with nodes {Xi : \|B(Xi, h) {Xj}n j=1\| k} and edges (Xi, Xj) if Xi Xj < 2h. 2. Compute C(h, k), the graphical connected components of Gh,k. OUTPUT: {C(h, k), k N}.
Open Source Code	No	The paper does not contain any explicit statements about code release, nor does it provide links to code repositories. The license information provided relates to the paper itself, not to any accompanying code.
Open Datasets	No	The paper focuses on theoretical analysis of the DBSCAN algorithm and density-based clustering. It uses a generic 'i.i.d. sample {Xi}n i=1 of points with common distribution P' as input for its algorithms and theoretical derivations, but does not use or reference any specific public or open datasets for empirical evaluation.
Dataset Splits	No	The paper is a theoretical work focusing on minimax rates and consistency of clustering algorithms. It does not conduct empirical experiments or use specific datasets, therefore, there is no mention of dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical, focusing on mathematical analysis and algorithm properties, not on experimental performance. Therefore, it does not mention any specific hardware used for computations or experiments.
Software Dependencies	No	The paper is a theoretical study of the DBSCAN algorithm. It describes algorithmic procedures and their mathematical properties but does not detail any specific software implementations or dependencies with version numbers for reproducing empirical results.
Experiment Setup	No	The paper is a theoretical work on density-based clustering, deriving minimax rates and consistency for the DBSCAN algorithm. It does not present empirical experiments, and thus, there are no details regarding hyperparameter values, training configurations, or other system-level settings for an experimental setup.