TinyFoA: Memory Efficient Forward-Only Algorithm for On-Device Learning

Authors: Baichuan Huang, Amir Aminifar

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our proposed Tiny Fo A against BP and other forward-only algorithms and demonstrate its effectiveness and superiority compared to state-of-the-art forward-only algorithms in terms of classification performance and training memory overhead, reducing the memory overheads by an order of magnitude. We evaluate our proposed approach on MNIST (Le Cun 1998), CIFAR-10 (Krizhevsky 2009), and CIFAR-100 (Krizhevsky 2009). In addition, we extend our evaluation to a real-world medical Io T application on cardiac arrhythmia classification based on the MIT-BIH Arrhythmia Electrocardiogram (ECG) dataset (Mark et al. 1982).
Researcher Affiliation Academia Department of Electrical and Information Technology, Lund University, Sweden EMAIL
Pseudocode No The paper provides equations and descriptive text for the Forward, Gradient, and Update steps of the Tiny Fo A algorithm in Section 2.1, but it does not present them within a formally labeled 'Pseudocode' or 'Algorithm' block, nor is it formatted like typical code.
Open Source Code Yes Code https://github.com/whubaichuan/Tiny Fo A
Open Datasets Yes We evaluate our proposed approach on MNIST (Le Cun 1998), CIFAR-10 (Krizhevsky 2009), and CIFAR-100 (Krizhevsky 2009). In addition, we extend our evaluation to a real-world medical Io T application on cardiac arrhythmia classification based on the MIT-BIH Arrhythmia Electrocardiogram (ECG) dataset (Mark et al. 1982)
Dataset Splits No The paper mentions using well-known datasets and states "Our experiments utilize balanced datasets", but it does not provide specific percentages, sample counts, or references to predefined train/test/validation splits for these datasets.
Hardware Specification Yes All these algorithms undergo training on a server equipped with 2 16-core Intel (R) Xeon (R) Gold 6226R (Skylake) Central Processing Units (CPUs) and 1 NVIDIA Tesla T4 Graphics Processing Card (GPU).
Software Dependencies No The paper mentions "Py Torch" in the context of memory management but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes A three-hidden-layer network is considered for PEPIAT (because PEPITA with deeper networks exhibits a decrease in performance (Pau and Aymone 2023; Srinivasan et al. 2023)) and a four-hidden-layer for other algorithms. Finally, we further consider the vertical layer-wise training in Tiny Fo A and introduce it also to BP (i.e., BP+BW+BA+V) and conduct the comparison, where the default slice M is set to 2.