TinyFoA: Memory Efficient Forward-Only Algorithm for On-Device Learning
Authors: Baichuan Huang, Amir Aminifar
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our proposed Tiny Fo A against BP and other forward-only algorithms and demonstrate its effectiveness and superiority compared to state-of-the-art forward-only algorithms in terms of classification performance and training memory overhead, reducing the memory overheads by an order of magnitude. We evaluate our proposed approach on MNIST (Le Cun 1998), CIFAR-10 (Krizhevsky 2009), and CIFAR-100 (Krizhevsky 2009). In addition, we extend our evaluation to a real-world medical Io T application on cardiac arrhythmia classification based on the MIT-BIH Arrhythmia Electrocardiogram (ECG) dataset (Mark et al. 1982). |
| Researcher Affiliation | Academia | Department of Electrical and Information Technology, Lund University, Sweden EMAIL |
| Pseudocode | No | The paper provides equations and descriptive text for the Forward, Gradient, and Update steps of the Tiny Fo A algorithm in Section 2.1, but it does not present them within a formally labeled 'Pseudocode' or 'Algorithm' block, nor is it formatted like typical code. |
| Open Source Code | Yes | Code https://github.com/whubaichuan/Tiny Fo A |
| Open Datasets | Yes | We evaluate our proposed approach on MNIST (Le Cun 1998), CIFAR-10 (Krizhevsky 2009), and CIFAR-100 (Krizhevsky 2009). In addition, we extend our evaluation to a real-world medical Io T application on cardiac arrhythmia classification based on the MIT-BIH Arrhythmia Electrocardiogram (ECG) dataset (Mark et al. 1982) |
| Dataset Splits | No | The paper mentions using well-known datasets and states "Our experiments utilize balanced datasets", but it does not provide specific percentages, sample counts, or references to predefined train/test/validation splits for these datasets. |
| Hardware Specification | Yes | All these algorithms undergo training on a server equipped with 2 16-core Intel (R) Xeon (R) Gold 6226R (Skylake) Central Processing Units (CPUs) and 1 NVIDIA Tesla T4 Graphics Processing Card (GPU). |
| Software Dependencies | No | The paper mentions "Py Torch" in the context of memory management but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | A three-hidden-layer network is considered for PEPIAT (because PEPITA with deeper networks exhibits a decrease in performance (Pau and Aymone 2023; Srinivasan et al. 2023)) and a four-hidden-layer for other algorithms. Finally, we further consider the vertical layer-wise training in Tiny Fo A and introduce it also to BP (i.e., BP+BW+BA+V) and conduct the comparison, where the default slice M is set to 2. |