reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical Topological Data Analysis using Persistence Landscapes

Authors: Peter Bubenik

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show how a number of standard statistical tests can be used for statistical inference using this summary. We also prove that this summary is stable and that it can be used to provide lower bounds for the bottleneck and Wasserstein distances. In Section 4.1, "we sample 200 points from the uniform distribution on the union of two annuli. We then calculate the corresponding persistence landscape in degree one using the Vietoris-Rips complex. We repeat this 100 times and calculate the mean persistence landscape." In Section 4.2, "We sample this ﬁeld on a 100 by 100 grid, and calculate the persistence landscape of the sublevel set... We calculate the mean persistence landscapes in degrees 0 and 1 from 100 samples." In Section 4.3, "Here we combine persistence landscapes and statistical inference to discriminate between iid samples of 1000 points from a torus and a sphere... We then calculate the persistence landscape of this ﬁltered simplicial complex for 100 samples and plot the mean landscapes."
Researcher Affiliation	Academia	Peter Bubenik EMAIL Department of Mathematics Cleveland State University Cleveland, OH 44115-2214, USA
Pseudocode	No	The paper describes mathematical concepts and theoretical results with proofs. It does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The persistent homologies in this section were calculated using java Plex (Tausz et al., 2011) and Perseus by Nanda (2013). Another publicly available alternative is Dionysus by Morozov (2012). In Section 4.2 we use Matlab code courtesy of Eliran Subag that implements an algorithm from Wood and Chan (1994). These are third-party tools used by the authors, not open-source code for the methodology presented in this paper.
Open Datasets	No	The paper uses generated data for its examples. For instance, in Section 4.1, "we sample 200 points from the uniform distribution on the union of two annuli." In Section 4.2, "we consider a stationary Gaussian random ﬁeld on [0, 1]2... We sample this ﬁeld on a 100 by 100 grid." In Section 4.3, "we sample 1000 points from a torus and a sphere." No concrete access information for publicly available datasets is provided; instead, the authors describe how they generated their own data for the experiments.
Dataset Splits	No	The paper does not describe traditional training/test/validation dataset splits. Instead, it describes generating multiple samples or repetitions for statistical analysis, such as repeating a sampling process 100 times, but these are not dataset splits in the conventional machine learning sense for reproduction of specific data partitions.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The persistent homologies in this section were calculated using java Plex (Tausz et al., 2011) and Perseus by Nanda (2013). Another publicly available alternative is Dionysus by Morozov (2012). In Section 4.2 we use Matlab code courtesy of Eliran Subag that implements an algorithm from Wood and Chan (1994). While software names are mentioned, no specific version numbers are provided for JavaPlex, Perseus, Dionysus, or Matlab.
Experiment Setup	Yes	In Section 4.1, "We sample 200 points from the uniform distribution on the union of two annuli." In Section 4.2, "We sample this ﬁeld on a 100 by 100 grid... We calculate the mean persistence landscapes... from 100 samples." In Section 4.3, "we sample 1000 points... we construct a ﬁltered simplicial complex as follows. First we triangulate the underlying space using the Coxeter Freudenthal Kuhn triangulation, starting with a cubical grid with sides of length 1/2. Next we smooth our data using a triangular kernel with bandwidth 0.9. We evaluate this kernel density estimator at the vertices of our simplicial complex... calculate the persistence landscape of this ﬁltered simplicial complex for 100 samples." The paper also mentions using "the permutation test with 10,000 repetitions".