reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating t-SNE using Tree-Based Algorithms

Authors: Laurens van der Maaten

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed experiments on ﬁve large data sets to evaluate the performance of the Barnes-Hut and dual-tree variants of t-SNE. An implementation of the two algorithms (as well as an implementation of the original t-SNE algorithm) is available from http: //homepage.tudelft.nl/19j49/tsne. We describe the data sets we used in our experiments in Section 5.1. The setup of our experiments is presented in Section 5.2, and the results of our experiments are presented in Section 5.3.
Researcher Affiliation	Academia	Laurens van der Maaten EMAIL Pattern Recognition and Bioinformatics Group Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands
Pseudocode	No	The paper describes the algorithms and methods using mathematical equations and descriptive text, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Source code of our tree-based t-SNE algorithms is publicly available on http://homepage.tudelft.nl/19j49/tsne; this software has recently been successfully used to create large-scale embeddings of, among others, mouse brain data (Ji, 2013), metagenomic data (Laczny et al., 2014), and word embeddings (Cho et al., 2014).
Open Datasets	Yes	We performed experiments on ﬁve data sets: (1) the MNIST data set, (2) the CIFAR-10 data set (Krizhevsky, 2009), (3) the NORB data set (Le Cun et al., 2004), (4) the street view house numbers data set (Netzer et al., 2011), and (5) the TIMIT data set.
Dataset Splits	Yes	MNIST. The MNIST data set contains N =70, 000 gray scale handwritten digit images... CIFAR-10. The CIFAR-10 data set (Krizhevsky, 2009)... We trained a convolutional network... on the training images... Street View House Numbers. The street view house numbers (SVHN) data set contains N = 630, 420 labeled color images... The resulting network has a training error of 5.06% and a test error of 10.28%... TIMIT. The TIMIT data set contains 3, 696 spoken utterances (with a total of N = 1, 105, 455 frames)... We only used the TIMIT training set in our experiments.
Hardware Specification	Yes	All computation times were measured on a laptop computer with an Intel Core i5 4258U CPU running at 2.6GHz.
Software Dependencies	No	To extract features from the images, we trained a convolutional network with three convolutional layers on the training images using Caﬀe (Jia, 2013). ... We trained the network to minimize the cross-entropy loss using Caﬀe (Jia, 2013) with one full sweep through the training data using mini-batches of size 100, a ﬁxed learning rate of 0.001, and a momentum term of 0.9.
Experiment Setup	Yes	In all experiments, we follow the experimental setup of van der Maaten and Hinton (2008) as closely as possible. In particular, we initialize the embedding E by sampling the points yi from a Gaussian with a variance of 10 4, and we run a gradient-descent optimizer for 1, 000 iterations, setting the initial step size to 200. We update the step size during the optimization using the scheme of Jacobs (1988). We use an additional momentum term that has weight 0.5 during the ﬁrst 250 iterations, and 0.8 afterwards. In all experiments, the perplexity u used to compute the input similarities is ﬁxed to 50. All data sets were preprocessed using PCA to reduce their dimensionality to 50 before t-SNE was performed. ... In our experiments, we ﬁx α = 12 (by contrast, van der Maaten and Hinton, 2008 used α=4).