Towards a Robust Persistence Diagram via Data-dependent Kernel

Authors: Hang Zhang, Kaifeng Zhang, Kai Ming Ting, Ye Zhu

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation reveals that (i) the proposed kernel provides a better mean for UMAP dimensionality reduction (ii) the proposed filter function can significantly improve the performance of Topological Point Cloud Clustering (iii) the proposed filter function is a more effective way of constructing Persistence Diagram for t-SNE visualization and SVM classification than three existing methods of TDA
Researcher Affiliation Academia HANG ZHANG, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China KAIFENG ZHANG, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China KAI MING TING, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China YE ZHU, School of IT, Deakin University, Australia
Pseudocode No The paper describes methods and processes in narrative text and numbered lists, but does not present any explicitly labeled pseudocode blocks or algorithm boxes.
Open Source Code Yes Code is available at https://github.com/Isolation Kernel/Codes/tree/main/Lambda-kernel.
Open Datasets Yes We compare the performance of UMAP and Ξ›-MAP on six real datasets and one artificial dataset. The dimensionality reduction results on seven datasets are shown in Table 3. We verify the behaviors of Ξ›-filter and DTM with respect to Definition 14 on the Cassini dataset (Chazal, Fasy, et al. 2017). Finally, we feed Ξ”π‘šπ‘Žπ‘₯= π‘šπ‘Žπ‘₯{Ξ”0, Ξ”1} to t-SNE (Van der Maaten and Hinton 2008) and get the visualization result, which is shown in Figure 5. The dataset we used consists of 150 images (or point clouds 𝐢1, ...,𝐢150) from 3 types of cells in tumor regions (Vipond et al. 2021). In this experiment, we examine the use of PDs in a classification task on a bone-scripts dataset5, as shown in Figure 4a. 5The dataset is available at http://jgw.aynu.edu.cn/. Our experiment is conducted on real dataset MPEG711 (Latecki et al. 2000; Vishwanath et al. 2020), which contains 70 shape categories with 20 different images for each category. 11The dataset is available at https://github.com/sidv23/robust-PDs.
Dataset Splits Yes We vary the PI bandwidth from 0.1 to 0.4, and report the mean classification accuracy and the corresponding standard deviation of 10 random train/test splits for each PI bandwidth. All the methods are relatively stable with respect to the bandwidth. But in terms of classification accuracy, Ξ›-filter outperforms the other three methods for every bandwidth, as shown in Figure 6a. In each split, we take 70% of the whole dataset for training and 30% for testing. 3-fold cross-validation on the training set is used to select the best hyperparameters for each approach
Hardware Specification Yes The experiments are performed on a machine with 1500MHz CPUs and 2TB RAM.
Software Dependencies No The paper mentions using a SVM classifier and UMAP, but does not specify exact version numbers for these or any other software libraries/dependencies.
Experiment Setup Yes Parameter setting used in the experiments: For Ξ›-kernel, 𝑑= 200, πœ‚= , πœ“is searched over {2, 4, 8, 16, 32}. For DTM and Ck NN, the π‘˜is searched in {π‘š 𝑛|π‘š= 0.02, 0.04, 0.06, 0.08, 0.1}, where 𝑛is the dataset size. For UMAP, the number of neighbors is search in [5, 10, 20, 50, 100, 200]. For Ξ›-MAP, πœ“is searched in [2,4,8,16,32,64], πœ‚= and 𝑑= 500.