Learning Using a Single Forward Pass
Authors: Aditya Somasundaram, Pushkal Mishra, Ayon Borthakur
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluate: Experiments conducted in this paper indicate that SPELA closely matches backpropagation in performance, and due to its computational efficiency, maintains an edge over it in resource-constrained scenarios. Moreover, SPELA can efficiently fine-tune models trained with backpropagation (transfer learning). In addition, extending SPELA to convolutional neural networks (CNNs) allows for complex image classification. Complexity Analysis: Theoretical bounds for peak memory usage show that SPELA can edge over backpropagation in the analyzed settings. We perform an in-depth analysis of SPELA s capacity. We first understand SPELA s learning dynamics on the standard MNIST 10 dataset. |
| Researcher Affiliation | Academia | Aditya Somasundaram EMAIL Columbia University Pushkal Mishra EMAIL University of California San Diego Ayon Borthakur EMAIL IIT Guwahati |
| Pseudocode | Yes | Algorithm 1 Training MLP with SPELA Algorithm 2 Training and Inference from CNN ith layer with SPELA Algorithm 3 Inference on MLP trained with SPELA |
| Open Source Code | No | The paper does not explicitly state that source code is provided or offer a link to a repository for the methodology described. |
| Open Datasets | Yes | Further, SPELA is extended with significant modifications to train CNN networks, which we evaluate on CIFAR-10, CIFAR-100, and SVHN 10 datasets, showing equivalent performance compared to backpropagation. We perform an in-depth analysis of SPELA s capacity. We first understand SPELA s learning dynamics on the standard MNIST 10 dataset. Table 3 describes the key comparison results on MNIST 10, KMNIST 10, and FMNIST datasets (we selected these datasets as they are well suited for a multilayer perceptron classification task). In these experiments, the networks are trained for 200 epochs with a learning rate of 0.1, and analysis is done on six datasets: Aircraft 100, CIFAR 10, CIFAR 100, Flowers 102, Food 101, and Pets 37 datasets (the numbers denote the number of classes in that dataset). |
| Dataset Splits | Yes | We compute the top-1 and top-5 accuracies for varying degrees of training data(keeping the test dataset fixed). We follow the canonical transfer learning approach wherein the classifier head is replaced by a layer size equivalent to the number of classes. Figure 6, 8 and Tables 10, 11, 12, 13, 14, 15, describe the performance on the six datasets. The accuracy plots (after 200 epochs of fine tuning) of SPELA, SPELA 5x and Backpropagation trained networks for train dataset size percentages of 1, 5, 10, 25, 50, 75, and 100 during transfer learning(keeping the test dataset fixed). |
| Hardware Specification | Yes | We used an NVIDIA RTX 4500 Ada generation GPU for all our studies. |
| Software Dependencies | No | The paper mentions using 'Py Torch Hub' but does not specify its version or any other software dependencies with their respective version numbers. |
| Experiment Setup | Yes | Full experimental details are provided in Tables 19 and 20. Table 19: Experiment Details for SPELA MLP (Layer 1, Layer 2, Learning rate, Decay rate, Decay epoch, Batch size, # Epochs, Dropout, Weight init, Bias, Optimizer, Activation, Loss) Table 20: Experiment Details for SPELA MLP (Layer 1, Layer 2, Learning rate, Decay rate, Decay epoch, Batch size, # Epochs, Dropout, Weight init, Bias, Optimizer, Activation, Loss) Table 21: Experiment Details of SPELA on Transfer Learning (Layer 1, Learning rate, Decay rate, Batch size, # Epochs, Dropout, Weight init, Bias, Optimizer, Loss) Table 22: Experiment Details of SPELA MLP for Ablation Studies (Layer 1, Layer 2, Learning rate, Decay rate, Decay epoch, Batch size, # Epochs, Dropout, Weight init, Bias, Optimizer, Activation, Loss) Table 23: Experimental details of SPELA convolutional neural network (Input size, Conv, MLP, Learning rate, Decay rate, Batch size, # Epochs, Dropout, Weight init, Bias, Optimizer, Activation, Loss) |