Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
A Primer for Neural Arithmetic Logic Modules
Authors: Bhumika Mistry, Katayoun Farrahi, Jonathon Hare
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To alleviate the existing inconsistencies, we create a benchmark which compares all existing arithmetic NALMs. ... We compare existing findings across modules. ... Therefore, we provide results on a Single Module Arithmetic Task, training modules on their respective operations over a range of different interpolation distributions and testing over a range of extrapolation distributions. ... We present the NALMs performances on the four main arithmetic operations. Each figure consist of plots for each evaluation metric (success rate, speed of convergence and sparsity error) discussed in the evaluation paragraph above, with confidence intervals calculated over 25 seeds. |
| Researcher Affiliation | Academia | Bhumika Mistry EMAIL Katayoun Farrahi EMAIL Jonathon Hare EMAIL Department of Vision Learning, and Control Electronics and Computer Science University of Southampton Southampton, SO17 1BJ, United Kingdom |
| Pseudocode | No | The paper provides mathematical definitions for the modules (e.g., Equations 1-5 for NALU, Equations 6-12 for iNALU) and architectural illustrations (Figures 2-9). Appendix C also provides a 'Step-by-step Example using the NALU', which details calculations in a narrative format, but there are no structured pseudocode or algorithm blocks explicitly labeled as such. |
| Open Source Code | Yes | Code is available at: https://github.com/bmistry4/nalm-benchmark |
| Open Datasets | Yes | MNIST is also used to evaluate NALU s abilities on being part of end-to-end applications. ... Madsen and Johansen (2020) also use MNIST for testing the module s abilities to act as a recurrent module for adding/multiplying the digits. ... Interpolation (train/validation) and extrapolation (test) ranges are presented in Table 3. Data (as floats) is drawn from a Uniform distribution with the range values as the lower and upper bounds. |
| Dataset Splits | Yes | Interpolation (training/validation) and extrapolation (test) ranges are presented in Table 3. ... Table 3: Interpolation (train/validation) and extrapolation (test) ranges used for the Single Module Arithmetic Task. Data (as floats) is drawn from a Uniform distribution with the range values as the lower and upper bounds. |
| Hardware Specification | No | The authors acknowledge the use of the IRIDIS High Performance Computing Facility, the ECS Alpha Cluster, and associated support services at the University of Southampton in the completion of this work. This refers to computing facilities but lacks specific details like CPU/GPU models or memory amounts. |
| Software Dependencies | No | Table 2 lists 'Programming framework Pytorch (Python) Flux (Julia) Tensorflow (Python)' for different experiment setups. However, specific version numbers for these frameworks or any other software dependencies are not provided. |
| Experiment Setup | Yes | Setup. A single module is used. The input size is two and output size is one, hence there is no input redundancy. Hence, the objective is to model: y = x1 x2 where {+, , , }. We test the: NALU, i NALU, G-NALU, NAC+, NAC , NAU, NMU, NPU, and Real NPU. Each run trains for 50,000 iterations to allow for enough iterations until convergence. A MSE loss is used with an Adam optimiser. Interpolation (training/validation) and extrapolation (test) ranges are presented in Table 3. Early stopping is applied using a validation dataset sampled from the interpolation range. Experiment/hyper-parameters set can be found in Appendix D. ... Appendix D (Table 5, 6, 7, 8) contains detailed parameters like 'Total iterations 50000', 'Learning rate 1.00E-03', 'Optimiser Adam (with default parameters)', and specific parameters for NPU, Real NPU, NAU, NMU, and iNALU. |