Knowledge Matters: Importance of Prior Information for Optimization
Authors: Çağlar Gülçehre, Yoshua Bengio
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explored the effect of introducing prior knowledge into the intermediate level of deep supervised neural networks on two tasks. On a task we designed, all black-box state-of-the-art machine learning algorithms which we tested, failed to generalize well. We motivate our work from the hypothesis that, there is a training barrier involved in the nature of such tasks, and that humans learn useful intermediate concepts from other individuals by using a form of supervision or guidance using a curriculum. Our results provide a positive evidence in favor of this hypothesis. In our experiments, we trained a two-tiered MLP architecture on a dataset for which each input image contains three sprites, and the binary target class is 1 if all of three shapes belong to the same category and otherwise the class is 0. |
| Researcher Affiliation | Academia | C a glar G ul cehre EMAIL Yoshua Bengio EMAIL D epartement d informatique et de recherche op erationnelle Universit e de Montr eal, Montr eal, QC, Canada |
| Pseudocode | No | The paper describes the model architectures (SMLP, P1NN, P2NN) using mathematical equations and textual descriptions, but it does not include explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | The source code of some experiments presented in that paper is available at https://github.com/caglar/kmatters. (...) The source code of the structured MLP is available at the Git Hub repository: https://github.com/caglar/structured_mlp. (...) The codes to reproduce these experiments are available at https://github.com/caglar/Pentomino Exps. |
| Open Datasets | Yes | In order to test our hypothesis, we designed an artificial dataset for object recognition using 64 64 binary images. The source code for the script that generates the artificial Pentomino datasets (Arcade-Universe) is available at: https://github.com/caglar/Arcade-Universe. |
| Dataset Splits | Yes | Initially the models are cross-validated by using 5-fold cross-validation. With 40, 000 examples, this gives 32, 000 examples for training and 8, 000 examples for testing. (...) For the experimental results shown in Table 4, we used 3 training set sizes of 20k, 40k and 80k examples. We generated each dataset with different random seeds (so they do not overlap). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models) used for running its experiments. It mentions using 'a Theano... implementation of Convolutional Neural Networks', which implies computation on CPU/GPU, but no specific models are named. |
| Software Dependencies | No | The paper mentions several software packages like 'scikit-learn', 'libsvm', 'Theano', and 'pylearn2' but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The P1NN has a highly overcomplete architecture with 1024 hidden units per patch, and L1 and L2 weight decay regularization coefficients on the weights (not the biases) are respectively 1e 6 and 1e 5. The learning rate for the P1NN is 0.75. (...) The P2NN has 2048 hidden units. L1 and L2 penalty coefficients for the P2NN are 1e-6, and the learning rate is 0.1. (...) With extensive hyperparameter optimization and using standardization in the intermediate level of the SMLP with softmax nonlinearity, SMLP-nohints was able to get 5.3% training and 6.7% test error on the 80k Pentomino training dataset. (...) We used 2050 hidden units in the P1NN, 11 softmax outputs per patch, and 1024 hidden units in the P2NN. The network was trained with a learning rate 0.1 without using any adaptive learning rate. The SMLP uses a rectifier nonlinearity for hidden layers of both P1NN and P2NN. We also applied a small amount of L1 and L2 regularization on the weights of the network. |