Improving Continual Learning by Accurate Gradient Reconstructions of the Past

Authors: Erik Daxberger, Siddharth Swaroop, Kazuki Osawa, Rio Yokota, Richard E Turner, José Miguel Hernández-Lobato, Mohammad Emtiyaz Khan

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using this principle, we design a prior which combines two types of replay methods with a quadratic weight-regularizer and achieves better gradient reconstructions. The combination improves performance on standard task-incremental continual learning benchmarks such as Split-CIFAR, Split Tiny Image Net, and Image Net-1000, achieving > 80% of the batch performance by simply utilizing a memory of <10% of the past data. Our work shows that a good combination of the two strategies can be very effective in reducing forgetting.
Researcher Affiliation Collaboration Erik Daxberger EMAIL University of Cambridge & MPI for Intelligent Systems, Tübingen Siddharth Swaroop EMAIL Harvard University Kazuki Osawa EMAIL Google Deep Mind Rio Yokota EMAIL Tokyo Institute of Technology Richard E. Turner EMAIL University of Cambridge José Miguel Hernández-Lobato EMAIL University of Cambridge Mohammad Emtiyaz Khan EMAIL RIKEN Center for AI Project
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'https://github.com/libffcv/ffcv-imagenet/' for the ImageNet training pipeline in footnote 6, which is a third-party library. There is no explicit statement or link indicating that the authors have provided open-source code for the methodology described in this paper.
Open Datasets Yes The combination improves performance on standard task-incremental continual learning benchmarks such as Split-CIFAR, Split Tiny Image Net, and Image Net-1000, achieving > 80% of the batch performance by simply utilizing a memory of <10% of the past data. ... Split-CIFAR (Zenke et al., 2017) ... CIFAR-10 (Krizhevsky et al., 2009) ... CIFAR-100 (Krizhevsky et al., 2009)... Split-Tiny Image Net by dividing Tiny Image Net (Le & Yang, 2015)... Image Net-1000 benchmark proposed by Rebuffi et al. (2017), which randomly splits the full Image Net dataset (Deng et al., 2009)...
Dataset Splits Yes The first task is CIFAR-10 (Krizhevsky et al., 2009) with 50,000 training and 10,000 test data points across 10 classes. The subsequent 5 tasks are taken sequentially from CIFAR-100 (Krizhevsky et al., 2009), each with 5,000 training and 1,000 test data points across 10 classes. ... Each class has 500 data points split into training (80%) and validation (20%), and 50 test points (totalling to 110,000 points).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies Yes For training on each task, we use the Image Net reference training pipeline (with 40 epoch configuration) of the FFCV library (Leclerc et al., 2022).6 For all details of the training procedure, see https://github.com/libffcv/ffcv/, 2022. commit f25386557e213711cc8601833add36ff966b80b2.
Experiment Setup Yes On each task, we train for 80 epochs using Adam with learning rate 10 3 and batch size 256. ... On each task, we train for 70 epochs (with early stopping and exponential learning rate decay, without regularization) using SGD with momentum 0.9 and batch size 200. ... For Image Net, we used T = 1.0, λ = 1.0 and τ = 0.16 for all methods.