reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Learning in Target Space

Authors: Michael Fairbank, Spyridon Samothrakis, Luca Citi

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we show the performance of the target-space method on the Two-Spirals benchmark problem, and on four classic small-image vision benchmark problems for convolutional neural networks, and then we demonstrate the target-space method on some bit-stream manipulation tasks and a sentiment-analysis task for recurrent neural networks. The experiments show the eﬀectiveness of the target-space method, in ability to train deep networks and produce improved generalisation.
Researcher Affiliation	Academia	Michael Fairbank EMAIL Spyridon Samothrakis EMAIL Luca Citi EMAIL Department of Computer Science and Electronic Engineering University of Essex Colchester, CO4 3SQ, UK
Pseudocode	Yes	Algorithm 1 Feed-Forward Dynamics... Algorithm 2 Converting Targets to Weights, in a FFNN, with Sequential Cascade Untangling (SCU)... Algorithm 3 Calculation of Learning Gradient in Target Space... Algorithm 4 Recurrent NN Dynamics... Algorithm 5 Conversion of Targets to Weights for a RNN (using SCU)
Open Source Code	Yes	Source code for experiments is available at https://github.com/mikefairbank/dlts_paper_code
Open Datasets	Yes	The MNIST digit dataset: 60,000 training samples of 28-by-28 grey-scale pixellated hand-written numeric digits, each labelled from 0-9, and a test set of 10,000 samples (Le Cun et al., 2010). MNIST-Fashion dataset: 60,000 28x28 grayscale images of 10 labelled fashion categories, along with a test set of 10,000 images (Xiao et al., 2017). CIFAR10 dataset: 50,000 32x32 colour training images, labelled over 10 categories, and 10,000 test images (Krizhevsky et al., 2009). CIFAR100 dataset: 50,000 32x32 colour training images, labelled over 100 categories, and 10,000 test images (Krizhevsky et al., 2009). RNN Movie-Review Sentiment Analysis: In this ﬁnal experiment we trained a RNN to solve the natural-language processing task of sentiment analysis for 50,000 movies reviews from the Internet Movie Database (IMDB) website.
Dataset Splits	Yes	The MNIST digit dataset: 60,000 training samples of 28-by-28 grey-scale pixellated hand-written numeric digits, each labelled from 0-9, and a test set of 10,000 samples (Le Cun et al., 2010). MNIST-Fashion dataset: 60,000 28x28 grayscale images of 10 labelled fashion categories, along with a test set of 10,000 images (Xiao et al., 2017). CIFAR10 dataset: 50,000 32x32 colour training images, labelled over 10 categories, and 10,000 test images (Krizhevsky et al., 2009). CIFAR100 dataset: 50,000 32x32 colour training images, labelled over 100 categories, and 10,000 test images (Krizhevsky et al., 2009). IMDB dataset was obtained from the Tensorﬂow/Keras packages, with a 50-50 training/test-set split, using options of only including the top 5000 most frequent words, and padding/truncating all reviews to a length of 500 words each.
Hardware Specification	Yes	All experiments were implemented using Python and Tensorﬂow v1.14 on a Tesla K80 GPU.
Software Dependencies	Yes	All experiments were implemented using Python and Tensorﬂow v1.14 on a Tesla K80 GPU.
Experiment Setup	Yes	The Two-Spirals classiﬁcation problem... used gradient-descent with optimal learning rates empirically determined as η = 10 for target space and η = 0.1 for weight space. The learning rate used was 0.01, which was found to be beneﬁcial to both target space and weight space on this problem... With target space, λ = 0.001 was used for equation (7), and initial targets were randomised using a truncated normal distribution with σ = 1... For weight-space learning, the weights were randomised using the method of Glorot and Bengio (2010). For CNN Experiments... All non-ﬁnal layers used the leaky-relu activation function... trained with the cross-entropy loss function and the Adam optimizer, with learning rate 0.001 for weight-space learning, and 0.01 for target-space learning. Minibatches of size nb = 100 were randomly generated at each iteration... A ﬁxed mini-batch of size nb = 100 was used for the targets input matrix X. In weight space, the weight initialisation used magnitudes deﬁned by He et al. (2015)... In target space, the targets values were all initially randomised with a truncated normal distribution with standard deviation 0.1... λ = 0.1 was used in equation (7). When dropout was used, it was applied with a dropout probability of 0.2 to all non-ﬁnal dense layers, and all even-numbered convolutional layers. When batch normalisation was used, it was applied to every convolutional layer and to every non-ﬁnal dense layer. For Bit-Stream Recurrent Neural-Network Experiments... The neural network has architecture 1 (N + 3) 2, with the hidden layer being fully connected to itself with recurrent connections... The hidden layer used tanh activation functions, and the ﬁnal layer used softmax with cross-entropy loss function... A batch size of 8,000 random bit streams of length nt = N + 50 was used to train the network. Random mini-batches of size nb = 100 were used during each training iteration. A ﬁxed mini-batch of size nb = 100 with nt = nt was used for the target-space matrices X (t). In weight space, the weight initialisation used magnitudes deﬁned by Glorot and Bengio (2010). In target space, the targets values were randomised with a truncated normal distribution with standard deviation 1... The networks were trained with 50,000 iterations of Adam optimiser, with learning rate 0.001 for both weight-space and target space, and with λ = 0.1 for target space. For RNN Movie-Review Sentiment Analysis... All neural networks were trained using Adam with learning rate 0.001, and mini-batch sizes of nb = 40. The target-space algorithm used λ = 0.001. Weights and targets were initially randomised as in the previous subsection. Word embeddings were also initially randomised (using a normal distribution with µ = 0 and σ = 0.1). ...a ﬁxed sequence of target-space input matrices X (t) was chosen, for a sequence length of just nt = 60, and mini-batch size nb = 40.