AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm

Authors: Marius Huber, Sara Kalisnik Hintz, Patrick Schnider

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a thorough comparison of Au To MATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that Au To MATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms.
Researcher Affiliation Academia Marius Huber EMAIL Department of Computational Linguistics University of Zürich Sara Kališnik EMAIL Department of Mathematics Pennsylvania State University Patrick Schnider EMAIL Department of Mathematics and Computer Science University of Basel Department of Computer Science ETH Zürich
Pseudocode Yes For a description of Au To MATo in pseudocode, see Algorithm 1.
Open Source Code Yes Finally, we provide an open-source implementation of Au To MATo in Python that is fully compatible with the standard scikit-learn architecture. ... The code is archived on Zenodo (doi.org/10.5281/zenodo.17279741) and developed openly on Git Hub (github.com/ m-a-huber/automato_paper).
Open Datasets Yes The data sets on which we ran Au To MATo and the above clustering algorithms stem from the Clustering Benchmarks suite (Gagolewski, 2022). ... The data set is available as part of the locfit R-package (Loader, 2024).
Dataset Splits No The paper uses various benchmark datasets and mentions min-max scaling and comparison against ground truth. However, it does not explicitly describe training/test/validation splits for these datasets. For clustering benchmarks, the common practice is to cluster the entire dataset and compare it to a provided ground truth, rather than splitting into train/test sets for model development.
Hardware Specification Yes We ran our experiments on a laptop with a 12th Gen Intel Core i7-1260P processor running at 2.10GHz.
Software Dependencies No The paper mentions using Python, scikit-learn, and refers to implementations of To MATo and bottleneck distance from GUDHI (Glisse, 2025; Godi, 2025). However, specific version numbers for scikit-learn or Python are not provided. The references to Glisse (2025) and Godi (2025) appear to be documentation or future releases, not specific library versions used for their implementation.
Experiment Setup Yes By default, Au To MATo performs the bootstrap on B = 1000 subsamples of the input point cloud, and sets the confidence level to α = 0.35. ... Au To MATo uses the k-nearest neighbor graph and the (logarithm of the) distance-to-measure density estimators by default, each with k = 10. ... We set the hyperparameters of the HDBSCAN, FINCH and the TTK-algorithm to their default values (as per their respective implementations). In contrast to this, we let the distance threshold parameter for the DBSCAN and the hierarchical clustering algorithms vary from 0.05 to 1.00 in increments of 0.05.