XAI Beyond Classification: Interpretable Neural Clustering

Authors: Xi Peng, Yunfan Li, Ivor W. Tsang, Hongyuan Zhu, Jiancheng Lv, Joey Tianyi Zhou

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we carry out experiments to verify the effectiveness of the proposed TELL comparing with 14 state-of-the-art clustering approaches. ... Our method is evaluated on the following three data sets, namely, the full MNIST handwritten digital database..., the full CIFAR-10 image database..., and the full CIFAR-100 image database... Three widely used metrics are used to evaluate the clustering performance, including Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI).
Researcher Affiliation Academia 1College of Computer Science, Sichuan University, Chengdu, China. 2Centre for Frontier Artificial Intelligence Research, A*STAR, Singapore. 3Australian Artificial Intelligence Institute, University of Technology, Sydney Australia. 4Institute for Infocomm Research, A*STAR, Singapore. 5Institute of High Performance Computing, A*STAR, Singapore.
Pseudocode No The paper describes the proposed method and training process using mathematical formulations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Extensive experiments show that our method achieves superior performance comparing with 14 clustering approaches on three challenging data sets. The source code could be accessed at www.pengxi.me.
Open Datasets Yes Data sets: Our method is evaluated on the following three data sets, namely, the full MNIST handwritten digital database (Lecun et al., 1998), the full CIFAR-10 image database (Krizhevsky and Hinton, 2009), and the full CIFAR-100 image database (Krizhevsky and Hinton, 2009).
Dataset Splits Yes The training and test split are merged in all our experiments. ... The MNIST data set consists of 70,000 handwritten digits over 10 classes... The CIFAR-10 data set consists of 60,000 RGB images... The CIFAR-100 data set contains 60,000 RGB images... For CIFAR-100, we adopt its 20 super-classes as partitions. ... 1,000 digits are randomly sampled from the MNIST test set as the training data... 60,000 digits from the MNIST training set are used to evaluate the performance.
Hardware Specification Yes All experiments are conducted on a Nvidia 2080Ti GPU with Py Torch 1.7.0 and CUDA 11.0.
Software Dependencies Yes All experiments are conducted on a Nvidia 2080Ti GPU with Py Torch 1.7.0 and CUDA 11.0.
Experiment Setup Yes Both the autoencoder and the cluster layer are randomly initialized with Kaiming uniform (He et al., 2015), which are then simultaneously trained for 3000 epochs with the default Adadelta (Zeiler, 2012) optimizer.