HADAMRNN: BINARY AND SPARSE TERNARY ORTHOGONAL RNNS
Authors: Armand Foucault, Francois Malgouyres, Franck Mamalet
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting ORNNs, named Hadam RNN and Block-Hadam RNN, are evaluated on various benchmarks, including the copy task, permuted and sequential MNIST tasks, the IMDB dataset, two GLUE benchmarks, and two IoT benchmarks. Despite binarization or sparse ternarization, these RNNs maintain performance levels comparable to state-of-the-art full-precision models, highlighting the effectiveness of our approach. Notably, our approach is the first solution with binary recurrent weights capable of tackling the copy task over 1000 timesteps. |
| Researcher Affiliation | Collaboration | Armand Foucault Institut de Mathématiques de Toulouse, UMR5219. Université de Toulouse, CNRS. UPS IMT, F-31062 Toulouse Cedex 9, France EMAIL Franck Mamalet Institut de Recherche Technologique Saint Exupéry Toulouse, France EMAIL François Malgouyres Institut de Mathématiques de Toulouse, UMR5219. Université de Toulouse, CNRS. UPS IMT, F-31062 Toulouse Cedex 9, France EMAIL |
| Pseudocode | No | The paper describes the Straight-through Estimator mathematically in Section C.1 and provides recurrence relations for RNNs using equations in Section 3. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured, step-by-step procedures. |
| Open Source Code | Yes | The code implementing the experiments is available at hadam RNN |
| Open Datasets | Yes | The resulting ORNNs, named Hadam RNN and Block-Hadam RNN, are evaluated on various benchmarks, including the copy task, permuted and sequential MNIST tasks, the IMDB dataset, two GLUE benchmarks, and two IoT benchmarks. |
| Dataset Splits | Yes | D.1 COPY TASK We generated 512K samples for the training set, and 2K samples for both validation and test. D.2 PERMUTED / SEQUENTIAL MNIST We used 50K samples for training, 10K samples for validation and 10K samples for testing. D.3 IMDB DATASET The IMDB dataset contains 50,000 samples. Among these, 25,000 samples are used for training, and the remaining 25,000 are equally divided between validation and testing. |
| Hardware Specification | Yes | Experiments where done on a NVIDIA Ge Force RTX 3080 GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer Kingma and Ba (2015)' and 'Glorot initialization method Glorot and Bengio (2010)', which are algorithms, but does not provide specific software names with version numbers for implementation (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | D.1 COPY TASK We generated 512K samples for the training set, and 2K samples for both validation and test. Hadam RNN and Block-Hadam RNN were trained using the Adam optimizer Kingma and Ba (2015). We used a batch size of 128 samples. The learning rate is initialized to 1e-4 is decayed exponentially by applying a factor 0.98 after each epoch. 10 epochs were used for training. D.2 PERMUTED / SEQUENTIAL MNIST We used 50K samples for training, 10K samples for validation and 10K samples for testing. Hadam RNN and Block-Hadam RNN were trained using the Adam optimizer Kingma and Ba (2015). We used a batch size of 64 samples. The learning rate is initialized to 1e-3 is decayed exponentially by applying a factor 0.98 after each epoch. 200 epochs were used for training. D.3 IMDB DATASET We used a batch size of 100 samples. The learning rate is initialized to 5e-4 is decayed exponentially by applying a factor 0.99 after each epoch. 30 epochs were used for training. |