Knowledge Matters: Importance of Prior Information for Optimization

Authors: Çağlar Gülçehre, Yoshua Bengio

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explored the effect of introducing prior knowledge into the intermediate level of deep supervised neural networks on two tasks. On a task we designed, all black-box state-of-the-art machine learning algorithms which we tested, failed to generalize well. We motivate our work from the hypothesis that, there is a training barrier involved in the nature of such tasks, and that humans learn useful intermediate concepts from other individuals by using a form of supervision or guidance using a curriculum. Our results provide a positive evidence in favor of this hypothesis. In our experiments, we trained a two-tiered MLP architecture on a dataset for which each input image contains three sprites, and the binary target class is 1 if all of three shapes belong to the same category and otherwise the class is 0.
Researcher Affiliation Academia C a glar G ul cehre EMAIL Yoshua Bengio EMAIL D epartement d informatique et de recherche op erationnelle Universit e de Montr eal, Montr eal, QC, Canada
Pseudocode No The paper describes the model architectures (SMLP, P1NN, P2NN) using mathematical equations and textual descriptions, but it does not include explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes The source code of some experiments presented in that paper is available at https://github.com/caglar/kmatters. (...) The source code of the structured MLP is available at the Git Hub repository: https://github.com/caglar/structured_mlp. (...) The codes to reproduce these experiments are available at https://github.com/caglar/Pentomino Exps.
Open Datasets Yes In order to test our hypothesis, we designed an artificial dataset for object recognition using 64 64 binary images. The source code for the script that generates the artificial Pentomino datasets (Arcade-Universe) is available at: https://github.com/caglar/Arcade-Universe.
Dataset Splits Yes Initially the models are cross-validated by using 5-fold cross-validation. With 40, 000 examples, this gives 32, 000 examples for training and 8, 000 examples for testing. (...) For the experimental results shown in Table 4, we used 3 training set sizes of 20k, 40k and 80k examples. We generated each dataset with different random seeds (so they do not overlap).
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models) used for running its experiments. It mentions using 'a Theano... implementation of Convolutional Neural Networks', which implies computation on CPU/GPU, but no specific models are named.
Software Dependencies No The paper mentions several software packages like 'scikit-learn', 'libsvm', 'Theano', and 'pylearn2' but does not provide specific version numbers for any of them.
Experiment Setup Yes The P1NN has a highly overcomplete architecture with 1024 hidden units per patch, and L1 and L2 weight decay regularization coefficients on the weights (not the biases) are respectively 1e 6 and 1e 5. The learning rate for the P1NN is 0.75. (...) The P2NN has 2048 hidden units. L1 and L2 penalty coefficients for the P2NN are 1e-6, and the learning rate is 0.1. (...) With extensive hyperparameter optimization and using standardization in the intermediate level of the SMLP with softmax nonlinearity, SMLP-nohints was able to get 5.3% training and 6.7% test error on the 80k Pentomino training dataset. (...) We used 2050 hidden units in the P1NN, 11 softmax outputs per patch, and 1024 hidden units in the P2NN. The network was trained with a learning rate 0.1 without using any adaptive learning rate. The SMLP uses a rectifier nonlinearity for hidden layers of both P1NN and P2NN. We also applied a small amount of L1 and L2 regularization on the weights of the network.