Towards a Robust Persistence Diagram via Data-dependent Kernel
Authors: Hang Zhang, Kaifeng Zhang, Kai Ming Ting, Ye Zhu
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation reveals that (i) the proposed kernel provides a better mean for UMAP dimensionality reduction (ii) the proposed filter function can significantly improve the performance of Topological Point Cloud Clustering (iii) the proposed filter function is a more effective way of constructing Persistence Diagram for t-SNE visualization and SVM classification than three existing methods of TDA |
| Researcher Affiliation | Academia | HANG ZHANG, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China KAIFENG ZHANG, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China KAI MING TING, National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University, China YE ZHU, School of IT, Deakin University, Australia |
| Pseudocode | No | The paper describes methods and processes in narrative text and numbered lists, but does not present any explicitly labeled pseudocode blocks or algorithm boxes. |
| Open Source Code | Yes | Code is available at https://github.com/Isolation Kernel/Codes/tree/main/Lambda-kernel. |
| Open Datasets | Yes | We compare the performance of UMAP and Ξ-MAP on six real datasets and one artificial dataset. The dimensionality reduction results on seven datasets are shown in Table 3. We verify the behaviors of Ξ-filter and DTM with respect to Definition 14 on the Cassini dataset (Chazal, Fasy, et al. 2017). Finally, we feed Ξπππ₯= πππ₯{Ξ0, Ξ1} to t-SNE (Van der Maaten and Hinton 2008) and get the visualization result, which is shown in Figure 5. The dataset we used consists of 150 images (or point clouds πΆ1, ...,πΆ150) from 3 types of cells in tumor regions (Vipond et al. 2021). In this experiment, we examine the use of PDs in a classification task on a bone-scripts dataset5, as shown in Figure 4a. 5The dataset is available at http://jgw.aynu.edu.cn/. Our experiment is conducted on real dataset MPEG711 (Latecki et al. 2000; Vishwanath et al. 2020), which contains 70 shape categories with 20 different images for each category. 11The dataset is available at https://github.com/sidv23/robust-PDs. |
| Dataset Splits | Yes | We vary the PI bandwidth from 0.1 to 0.4, and report the mean classification accuracy and the corresponding standard deviation of 10 random train/test splits for each PI bandwidth. All the methods are relatively stable with respect to the bandwidth. But in terms of classification accuracy, Ξ-filter outperforms the other three methods for every bandwidth, as shown in Figure 6a. In each split, we take 70% of the whole dataset for training and 30% for testing. 3-fold cross-validation on the training set is used to select the best hyperparameters for each approach |
| Hardware Specification | Yes | The experiments are performed on a machine with 1500MHz CPUs and 2TB RAM. |
| Software Dependencies | No | The paper mentions using a SVM classifier and UMAP, but does not specify exact version numbers for these or any other software libraries/dependencies. |
| Experiment Setup | Yes | Parameter setting used in the experiments: For Ξ-kernel, π‘= 200, π= , πis searched over {2, 4, 8, 16, 32}. For DTM and Ck NN, the πis searched in {π π|π= 0.02, 0.04, 0.06, 0.08, 0.1}, where πis the dataset size. For UMAP, the number of neighbors is search in [5, 10, 20, 50, 100, 200]. For Ξ-MAP, πis searched in [2,4,8,16,32,64], π= and π‘= 500. |