Outlier-Robust Subsampling Techniques for Persistent Homology

Authors: Bernadette J. Stolz

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our landmarks on artificial data sets which contain different levels of noise and compare them to standard landmark selection techniques. We demonstrate that our landmark selection outperforms standard methods as well as a subsampling technique based on an outlier-robust version of the k-means algorithm for low sampling densities in noisy data with respect to robustness to outliers.
Researcher Affiliation Academia Bernadette J. Stolz1,2 EMAIL 1 Ecole Polytechnique F ed erale de Lausanne Station 8, 1015 Lausanne, Switzerland 2 Mathematical Institute University of Oxford, Woodstock Rd Oxford OX2 6GG, United Kingdom
Pseudocode Yes We show pseudocode for the procedure in Algorithm 3 in Appendix A.1. We show our modified version of the k-means algorithm for landmark selection in Algorithm 2. We show the pseudocode for PH landmarks I in Algorithm 1, an algorithm for PH landmarks II can be formulated accordingly.
Open Source Code Yes We make an improved python version of our code for PH landmarks available on https://github.com/stolzbernadette/Outlier-robust-subsampling-techniques-f or-persistent-homology.
Open Datasets No We introduce the data sets to which we apply the different landmark selection methods. The data sets are chosen to be simple to allow us to determine effects of the landmark selection. The data sets consist of signal points that are sampled from a topologically interesting structure a sphere, a Klein bottle, or a torus and noise points that we design to be topologically different from the signal. We consider four 3-dimensional data sets. ... We use the Laplacian random number generator code (Chen, 2019) to generate α.
Dataset Splits No For a given number of points N and probability p we sample signal points uniformly at random from the surface of the unit sphere with probability p and noise points from the (filled) cube [ 1, 1]3 R3 with probability 1 p. This corresponds to a landmark sampling density of m/N. For the k landmarks, we define the number of clusters to be k = pm and the number of outliers to be j = (1 p)m, where p is the probability with which a point in the respective data set was sampled from the signal data, i.e. the sphere, torus, or Klein bottle.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies Yes We implement PH landmark selection method in Matlab using Ripser (Bauer, 2021) for the computation of the local Vietoris-Rips complexes. We also implement the k landmarks algorithm in Matlab. For the calculation of maxmin landmarks, random landmarks, and dense core subsets we use the inbuilt functions in the java Plex package (Tausz et al., 2014).
Experiment Setup Yes For the k landmarks, we define the number of clusters to be k = pm and the number of outliers to be j = (1 p)m, where p is the probability with which a point in the respective data set was sampled from the signal data, i.e. the sphere, torus, or Klein bottle. ... For the PH landmarks we choose δ = 0.2 for the 3-dimensional data sets, δ = 0.5 for the torus data and δ = 0.6 for the Klein bottle. In all our data sets, we find that when looking for the maximal persistence of a feature across all dimensions, the value of out0,1,2 PH (y) is exclusively determined by dimension 0. ... For the maxmin, random and k landmarks we show the average fraction of signal points and its standard deviation across 20 realisations of the selection algorithms.