reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards a Robust Persistence Diagram via Data-dependent Kernel

Authors: Hang Zhang, Kaifeng Zhang, Kai Ming Ting, Ye Zhu

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation reveals that (i) the proposed kernel provides a better mean for UMAP dimensionality reduction (ii) the proposed filter function can significantly improve the performance of Topological Point Cloud Clustering (iii) the proposed filter function is a more effective way of constructing Persistence Diagram for t-SNE visualization and SVM classification than three existing methods of TDA
Researcher Affiliation	Academia	HANG ZHANG, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China KAIFENG ZHANG, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China KAI MING TING, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China YE ZHU, School of IT, Deakin University, Australia
Pseudocode	No	The paper describes methods and processes in narrative text and numbered lists, but does not present any explicitly labeled pseudocode blocks or algorithm boxes.
Open Source Code	Yes	Code is available at https://github.com/Isolation Kernel/Codes/tree/main/Lambda-kernel.
Open Datasets	Yes	We compare the performance of UMAP and Λ-MAP on six real datasets and one artificial dataset. The dimensionality reduction results on seven datasets are shown in Table 3. We verify the behaviors of Λ-filter and DTM with respect to Definition 14 on the Cassini dataset (Chazal, Fasy, et al. 2017). Finally, we feed Δ𝑚𝑎𝑥= 𝑚𝑎𝑥{Δ0, Δ1} to t-SNE (Van der Maaten and Hinton 2008) and get the visualization result, which is shown in Figure 5. The dataset we used consists of 150 images (or point clouds 𝐶1, ...,𝐶150) from 3 types of cells in tumor regions (Vipond et al. 2021). In this experiment, we examine the use of PDs in a classification task on a bone-scripts dataset5, as shown in Figure 4a. 5The dataset is available at http://jgw.aynu.edu.cn/. Our experiment is conducted on real dataset MPEG711 (Latecki et al. 2000; Vishwanath et al. 2020), which contains 70 shape categories with 20 different images for each category. 11The dataset is available at https://github.com/sidv23/robust-PDs.
Dataset Splits	Yes	We vary the PI bandwidth from 0.1 to 0.4, and report the mean classification accuracy and the corresponding standard deviation of 10 random train/test splits for each PI bandwidth. All the methods are relatively stable with respect to the bandwidth. But in terms of classification accuracy, Λ-filter outperforms the other three methods for every bandwidth, as shown in Figure 6a. In each split, we take 70% of the whole dataset for training and 30% for testing. 3-fold cross-validation on the training set is used to select the best hyperparameters for each approach
Hardware Specification	Yes	The experiments are performed on a machine with 1500MHz CPUs and 2TB RAM.
Software Dependencies	No	The paper mentions using a SVM classifier and UMAP, but does not specify exact version numbers for these or any other software libraries/dependencies.
Experiment Setup	Yes	Parameter setting used in the experiments: For Λ-kernel, 𝑡= 200, 𝜂= , 𝜓is searched over {2, 4, 8, 16, 32}. For DTM and Ck NN, the 𝑘is searched in {𝑚 𝑛\|𝑚= 0.02, 0.04, 0.06, 0.08, 0.1}, where 𝑛is the dataset size. For UMAP, the number of neighbors is search in [5, 10, 20, 50, 100, 200]. For Λ-MAP, 𝜓is searched in [2,4,8,16,32,64], 𝜂= and 𝑡= 500.