Exploring the potential of Direct Feedback Alignment for Continual Learning

Authors: Sara Folchini, Viplove Arora, Sebastian Goldt

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train fully-connected networks on several continual learning benchmarks using DFA and compare its performance to vanilla backpropagation, random features, and other continual learning algorithms. We empirically show that DFA is competitive at Continual Learning to vanilla back-propagation and other baselines, such as random features (RF) and Elastic Weight Consolidation (EWC).
Researcher Affiliation Academia Sara Folchini EMAIL International Institute for Advanced Studies (SISSA) Trieste, Italy Viplove Arora EMAIL International Institute for Advanced Studies (SISSA) Trieste, Italy Sebastian Goldt EMAIL International Institute for Advanced Studies (SISSA) Trieste, Italy
Pseudocode No The paper describes the mathematical equations for BP and DFA weight updates (e.g., equations 1-4) but does not present them within a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not include any explicit statements about code availability, links to repositories, or mentions of code in supplementary materials for the methodology described.
Open Datasets Yes We report results on the Fashion MNIST (FMNIST) dataset (Xiao et al., 2017), CIFAR10 (Krizhevsky, 2009) dataset, and the MNIST dataset (Deng, 2012).
Dataset Splits Yes Split FMNIST (s FMNIST) and split CIFAR10, where we split the original dataset of 10 classes into five smaller datasets with two disjoint classes for each. The resulting smaller datasets will have very different statistical characteristics, so a model trained sequentially on them needs to be able to incrementally learn new information with dramatically different feature representations.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or cloud computing specifications used for running the experiments.
Software Dependencies No The paper does not explicitly mention any software dependencies with specific version numbers, such as programming languages or deep learning frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes In our experiments, we use 3-layer Fully-Connected Networks with 1000 neurons in each hidden layer. We train the networks for a maximum of 1000 epochs (an impact of this choice is expanded in F) and apply early-stopping by halting the training as soon as the network overcomes 99% training accuracy. All layers are initialized using the Xavier uniform initialization (Glorot & Bengio, 2010). We choose a logistic activation function in the output layer and ReLU in the other layers. The loss function is cross-entropy. For DFA, we use a learning rate of 0.01 and a Feedback matrix variance optimized in the range between the orders of 1e-8 and 1. Backpropagation... a Dropout layer (Srivastava et al., 2014) after each layer... (0.2% in the first layer and 0.5% in the other layers, excluding the output layer). We perform a grid-search optimization for the learning rate in the range between 1e-2 and 1e-4. Random Features... We use a learning rate of 1e-2. Elastic weight consolidation (EWC)... we chose a learning rate of 1e-3 and an importance of 1000; lambda is set to 0.4 by default...