Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel

Authors: Ye Zhu, Kai Ming Ting

JAIR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents the result of utility evaluation of isolation kernel and Gaussian kernel in t-SNE using 21 real-world datasets with different data sizes and dimensions. We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Table 5 shows the results of the two kernels used in t-SNE. The Isolation kernel performs better on 18 out of 21 datasets in terms of AUCRNX...
Researcher Affiliation Academia Ye Zhu EMAIL School of Information Technology, Deakin University, Burwood, Victoria, Australia 3125 Kai Ming Ting EMAIL National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China 210023
Pseudocode Yes The procedure of t-SNE is provided in Algorithm 1. ... Algorithm 2 t-SNE(D, ψ, m) which employs the Isolation kernel
Open Source Code Yes A demonstration of using t-SNE with Isolation kernel can be obtained from https://github.com/zhuye88/IKt-sne.
Open Datasets Yes COIL20, Human Activity and Isolet are from (Li, Cheng, Wang, Morstatter, Robert, Tang, & Liu, 2016); News20 and Rcv1 are from (Chang & Lin, 2011); and all other real-world datasets are from UCI Machine Learning Repository (Dua & Graff, 2017).
Dataset Splits No The paper mentions using either the MNIST dataset or a subsample of 10,000 data points from MNIST8M for processing, but does not specify explicit training/test/validation splits with percentages, sample counts, or citations to predefined splits for these or other datasets.
Hardware Specification Yes All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory.
Software Dependencies Yes All algorithms used in the following experiments were implemented in Matlab 2019b and were run on a machine with 14 cores (Intel Xeon E5-2690 v4 @ 2.59 GHz) and 256GB memory.
Experiment Setup Yes We report the best performance of each algorithm with a systematic parameter search with the range shown in Table 4. Note that there is only one manual parameter ψ to control the partitioning mechanism, and the other parameter t can be fixed to a default number. Parameters with search range Gaussian kernel perplexity {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; tolerance = 0.00005 Isolation kernel ψ {1, 5, ..., 97, 0.01n, 0.05n, ..., 0.97n}; t = 200