reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Outlier-Robust Subsampling Techniques for Persistent Homology

Authors: Bernadette J. Stolz

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our landmarks on artiﬁcial data sets which contain diﬀerent levels of noise and compare them to standard landmark selection techniques. We demonstrate that our landmark selection outperforms standard methods as well as a subsampling technique based on an outlier-robust version of the k-means algorithm for low sampling densities in noisy data with respect to robustness to outliers.
Researcher Affiliation	Academia	Bernadette J. Stolz1,2 EMAIL 1 Ecole Polytechnique F ed erale de Lausanne Station 8, 1015 Lausanne, Switzerland 2 Mathematical Institute University of Oxford, Woodstock Rd Oxford OX2 6GG, United Kingdom
Pseudocode	Yes	We show pseudocode for the procedure in Algorithm 3 in Appendix A.1. We show our modiﬁed version of the k-means algorithm for landmark selection in Algorithm 2. We show the pseudocode for PH landmarks I in Algorithm 1, an algorithm for PH landmarks II can be formulated accordingly.
Open Source Code	Yes	We make an improved python version of our code for PH landmarks available on https://github.com/stolzbernadette/Outlier-robust-subsampling-techniques-f or-persistent-homology.
Open Datasets	No	We introduce the data sets to which we apply the diﬀerent landmark selection methods. The data sets are chosen to be simple to allow us to determine eﬀects of the landmark selection. The data sets consist of signal points that are sampled from a topologically interesting structure a sphere, a Klein bottle, or a torus and noise points that we design to be topologically diﬀerent from the signal. We consider four 3-dimensional data sets. ... We use the Laplacian random number generator code (Chen, 2019) to generate α.
Dataset Splits	No	For a given number of points N and probability p we sample signal points uniformly at random from the surface of the unit sphere with probability p and noise points from the (ﬁlled) cube [ 1, 1]3 R3 with probability 1 p. This corresponds to a landmark sampling density of m/N. For the k landmarks, we deﬁne the number of clusters to be k = pm and the number of outliers to be j = (1 p)m, where p is the probability with which a point in the respective data set was sampled from the signal data, i.e. the sphere, torus, or Klein bottle.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	Yes	We implement PH landmark selection method in Matlab using Ripser (Bauer, 2021) for the computation of the local Vietoris-Rips complexes. We also implement the k landmarks algorithm in Matlab. For the calculation of maxmin landmarks, random landmarks, and dense core subsets we use the inbuilt functions in the java Plex package (Tausz et al., 2014).
Experiment Setup	Yes	For the k landmarks, we deﬁne the number of clusters to be k = pm and the number of outliers to be j = (1 p)m, where p is the probability with which a point in the respective data set was sampled from the signal data, i.e. the sphere, torus, or Klein bottle. ... For the PH landmarks we choose δ = 0.2 for the 3-dimensional data sets, δ = 0.5 for the torus data and δ = 0.6 for the Klein bottle. In all our data sets, we ﬁnd that when looking for the maximal persistence of a feature across all dimensions, the value of out0,1,2 PH (y) is exclusively determined by dimension 0. ... For the maxmin, random and k landmarks we show the average fraction of signal points and its standard deviation across 20 realisations of the selection algorithms.