Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel
Authors: Ye Zhu, Kai Ming Ting
JAIR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents the result of utility evaluation of isolation kernel and Gaussian kernel in t-SNE using 21 real-world datasets with different data sizes and dimensions. We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Table 5 shows the results of the two kernels used in t-SNE. The Isolation kernel performs better on 18 out of 21 datasets in terms of AUCRNX... |
| Researcher Affiliation | Academia | Ye Zhu EMAIL School of Information Technology, Deakin University, Burwood, Victoria, Australia 3125 Kai Ming Ting EMAIL National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China 210023 |
| Pseudocode | Yes | The procedure of t-SNE is provided in Algorithm 1. ... Algorithm 2 t-SNE(D, ψ, m) which employs the Isolation kernel |
| Open Source Code | Yes | A demonstration of using t-SNE with Isolation kernel can be obtained from https://github.com/zhuye88/IKt-sne. |
| Open Datasets | Yes | COIL20, Human Activity and Isolet are from (Li, Cheng, Wang, Morstatter, Robert, Tang, & Liu, 2016); News20 and Rcv1 are from (Chang & Lin, 2011); and all other real-world datasets are from UCI Machine Learning Repository (Dua & Graff, 2017). |
| Dataset Splits | No | The paper mentions using either the MNIST dataset or a subsample of 10,000 data points from MNIST8M for processing, but does not specify explicit training/test/validation splits with percentages, sample counts, or citations to predefined splits for these or other datasets. |
| Hardware Specification | Yes | All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory. |
| Software Dependencies | Yes | All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory. |
| Experiment Setup | Yes | We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Note that there is only one manual parameter ψ to control the partitioning mechanism, and the other parameter t can be fixed to a default number. Parameters with search range Gaussian kernel perplexity {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; tolerance = 0.00005 Isolation kernel ψ {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; t = 200 |