AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm
Authors: Marius Huber, Sara Kalisnik Hintz, Patrick Schnider
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a thorough comparison of Au To MATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that Au To MATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. |
| Researcher Affiliation | Academia | Marius Huber EMAIL Department of Computational Linguistics University of Zürich Sara Kališnik EMAIL Department of Mathematics Pennsylvania State University Patrick Schnider EMAIL Department of Mathematics and Computer Science University of Basel Department of Computer Science ETH Zürich |
| Pseudocode | Yes | For a description of Au To MATo in pseudocode, see Algorithm 1. |
| Open Source Code | Yes | Finally, we provide an open-source implementation of Au To MATo in Python that is fully compatible with the standard scikit-learn architecture. ... The code is archived on Zenodo (doi.org/10.5281/zenodo.17279741) and developed openly on Git Hub (github.com/ m-a-huber/automato_paper). |
| Open Datasets | Yes | The data sets on which we ran Au To MATo and the above clustering algorithms stem from the Clustering Benchmarks suite (Gagolewski, 2022). ... The data set is available as part of the locfit R-package (Loader, 2024). |
| Dataset Splits | No | The paper uses various benchmark datasets and mentions min-max scaling and comparison against ground truth. However, it does not explicitly describe training/test/validation splits for these datasets. For clustering benchmarks, the common practice is to cluster the entire dataset and compare it to a provided ground truth, rather than splitting into train/test sets for model development. |
| Hardware Specification | Yes | We ran our experiments on a laptop with a 12th Gen Intel Core i7-1260P processor running at 2.10GHz. |
| Software Dependencies | No | The paper mentions using Python, scikit-learn, and refers to implementations of To MATo and bottleneck distance from GUDHI (Glisse, 2025; Godi, 2025). However, specific version numbers for scikit-learn or Python are not provided. The references to Glisse (2025) and Godi (2025) appear to be documentation or future releases, not specific library versions used for their implementation. |
| Experiment Setup | Yes | By default, Au To MATo performs the bootstrap on B = 1000 subsamples of the input point cloud, and sets the confidence level to α = 0.35. ... Au To MATo uses the k-nearest neighbor graph and the (logarithm of the) distance-to-measure density estimators by default, each with k = 10. ... We set the hyperparameters of the HDBSCAN, FINCH and the TTK-algorithm to their default values (as per their respective implementations). In contrast to this, we let the distance threshold parameter for the DBSCAN and the hierarchical clustering algorithms vary from 0.05 to 1.00 in increments of 0.05. |