Adaptive Clustering Using Kernel Density Estimators
Authors: Ingo Steinwart, Bharath K. Sriperumbudur, Philipp Thomann
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we illustrate the behavior of our generic KDE-based clustering algorithm on a few artificial data sets for which the ground truth clustering can be computed. In addition, we compare their performance to k-means and hierarchical clustering. |
| Researcher Affiliation | Collaboration | Ingo Steinwart EMAIL University of Stuttgart Department of Mathematics D-70569 Stuttgart, GermanyBharath K. Sriperumbudur EMAIL Pennsylvania State University Department of Statistics University Park, PA 16802, USAPhilipp Thomann EMAIL D ONE Solutions AG Sihlfeldstrasse 58 8003 Zürich, Switzerland |
| Pseudocode | Yes | Algorithm 1 Clustering with the help of a generic level set estimator Algorithm 2 Estimating the split-tree with the help of a generic level set estimator |
| Open Source Code | No | The paper does not provide concrete access to source code. It describes implementation details in the 'Algorithms' subsection of Section 7 but does not provide a link or an explicit release statement. |
| Open Datasets | Yes | The first distribution, see Figure 9, is a mixture of 15 Gaussian distributions and a uniform background distribution. [...] Since this distribution was inspired by the S2-data set of Fränti and Virmajoki (2006) we will call it S2 in the following. |
| Dataset Splits | No | For each of the six cluster problems described above and the following sample sizes n {2500, 3000, 3500, 4200, 5000, 6000, 7000, 8200, 10000, 14000, 20000}. we generated 100 data sets. In addition, we also computed the true densities of the 6 distributions on a 1000 1000 grid of [0, 1]2 to find a high-resolution approximation of the ground truth clustering. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | Besides our methods we also considered k-means and hierarchical clustering. To this end, we used the functions kmeans, kmeans++, and hclust of R. |
| Experiment Setup | Yes | We considered 500 geometrically spaced candidate values of δ between c(ln(n)/n)1/d and c(ln n) 1/d, where in the experiments, the factor c was determined by an estimate of the median mutual distance between the samples of the considered data set. [...] Moreover, we considered both a plain moving window kernel and the Epanechnikov kernel, where in both cases the underlying norm was the Euclidean distance. Since both kernels have bounded support, we simply chose σ := δ, see (24), and ε := 3 p h D,δ n 1δ d for each candidate value δ. [...] Finally, we decided to focus on thickness guarantees with the most natural choice γ := 1, [...] we choose τ := (2 + ϵ) δ with ϵ = 0.00001, where we note that our theoretical findings actually hold true for each value τ > 2δ [...]. In addition, kmeans was repeated with 100 random initializations using the parameter nstart = 100. |