Exploring the Learning Mechanisms of Neural Division Modules
Authors: Bhumika Mistry, Katayoun Farrahi, Jonathon Hare
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In total we measure robustness over 475 different training sets for setups with and without input redundancy. We discover robustness is greatly affected by the input sign for the Real NPU and NRU, input magnitude for the NMRU and input distribution for every module. Despite this issue, we show that the modules can learn as part of larger end-to-end networks. |
| Researcher Affiliation | Academia | Bhumika Mistry EMAIL Department of Vision Learning, and Control Electronics and Computer Science University of Southampton Katayoun Farrahi EMAIL Department of Vision Learning, and Control Electronics and Computer Science University of Southampton Jonathon Hare EMAIL Department of Vision Learning, and Control Electronics and Computer Science University of Southampton |
| Pseudocode | No | The paper describes mathematical models and equations, but does not present any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code (MIT license) available at: https://github.com/bmistry4/nalm-division. |
| Open Datasets | Yes | We show how NALMs can learn in a larger end-to-end network using an arithmetic MNIST task. Following Bloice et al. (2021), the dataset contains permutation pairs of MNIST digits side-by-side with the target label being the product of the digits, e.g. input with output 4(= 4 1). |
| Dataset Splits | Yes | All experiments use a mean squared error (MSE) loss with an Adam optimiser (Kingma & Ba, 2015) and 10,000 samples for the validation and test sets. Training uses batch sizes of 128 and the best model for evaluation is taken using early stopping on the validation set. All runs are over 25 different seeds. All inputs are required in the no redundancy setting, i.e., input size of 2. Training takes 50,000 iterations where each iteration consists of a different batch. ... In contrast, the redundancy setting uses an input size of 10, where 8 input values are not required for the final output. The total training iterations are extended to 100,000. ... For the MNIST task: Train:Test 90:10 Batch Size 128 Train samples 72,000 (1 fold)/73,000 (9 folds) Test samples 9,000 (1 fold)/8,000 (9 folds) Folds/Seeds 10 |
| Hardware Specification | Yes | All Real NPU experiments were run on Iridis 5 (the University of Southampton s supercomputer), where a compute node has 40 CPUs with 192 GB of DDR4 memory which uses dual 2.0 GHz Intel Skylake processors. All NRU and NMRU experiments were run on a 16 core CPU server with 125 GB memory 1.2 GHz processors. All experiments for the MNIST Tasks were trained using a single Ge Force GTX 1080 GPU. |
| Software Dependencies | No | All experiments use a mean squared error (MSE) loss with an Adam optimiser (Kingma & Ba, 2015)... We choose this precision as it can be guaranteed when working with 32-bit Py Torch Tensors. The paper mentions PyTorch and the Adam optimizer but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Default parameters: A summary of all relevant parameters is found in Appendix C. All experiments use a mean squared error (MSE) loss with an Adam optimiser (Kingma & Ba, 2015) and 10,000 samples for the validation and test sets. Training uses batch sizes of 128 and the best model for evaluation is taken using early stopping on the validation set. All runs are over 25 different seeds. ... The Real NPU uses a learning rate of 5e-3 with sparsity regularisation scaling during iterations 40,000 to 50,000. The NRU and NMRU use sparsity regularisation scaling during iterations 20,000 to 35,000 and a learning rate of 1 and 1e-2 respectively. ... For MNIST: Epochs 1000, Learning rate 1e-3, λstart λend epochs 30-40. |