Overcoming the Stability Gap in Continual Learning

Authors: Md Yousuf Harun, Christopher Kanan

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In large-scale experiments for both easy and hard CL distributions (e.g., class incremental learning), we demonstrate that our method reduces the stability gap and greatly increases computational efficiency. Our main CIL results are given in Table 1. SGM with rehearsal shows the greatest reduction in the stability gap (S), plasticity gap (P), and continual knowledge gap (CK). It also performs best in other metrics.
Researcher Affiliation Academia Md Yousuf Harun EMAIL Rochester Institute of Technology Christopher Kanan EMAIL University of Rochester
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured code-like steps.
Open Source Code Yes Code is available at https://yousuf907.github.io/sgmsite
Open Datasets Yes For this purpose, we use Image Net-1K pre-trained models (K = 1000). Image Net-1K (Russakovsky et al., 2015) has 1.28 million images from 1000 categories... Places365-LT (Liu et al., 2019) is a long-tailed dataset... Places365-Standard (Zhou et al., 2017) has over 1.8 million training images... CUB-200 (Wah et al., 2011) has RGB images of 200 bird species...
Dataset Splits Yes Image Net-1K (Russakovsky et al., 2015) has 1.28 million images from 1000 categories, each with 732 1300 training images and 50 validation images. Places365-LT has 365 classes and 62500 training images with 5 to 4980 images per class. For its test set, we use the Places365-LT validation set from (Liu et al., 2019) which consists of a total of 7300 images with a balanced distribution of 20 images per class. Places365-Standard (Zhou et al., 2017) has over 1.8 million training images from 365 classes... We use the validation set consisting of 100 images per class to test the models. CUB-200 (Wah et al., 2011) has RGB images of 200 bird species with 5994 training images and 5794 test images.
Hardware Specification Yes We ran all experiments on the same hardware with a single GPU (NVIDIA RTX A5000).
Software Dependencies No The paper mentions using 'Adam W optimizer', 'Deep Speed', and 'One Cycle learning rate scheduler', but does not provide specific version numbers for any software libraries, frameworks, or environments.
Experiment Setup Yes For both CIL and IID experiments, we train SGM with rehearsal, vanilla rehearsal, and output layer only using cross-entropy loss for 600 iterations per rehearsal session. During each iteration model is updated on 128 samples. All methods use the same Conv Ne Xt V2 backbone 3, use Adam W optimizer with weight decay of 0.05 and initial learning rates of 10 3 (SGM and vanilla) and 10 2 (output layer only). The learning rate is reduced in earlier layers by a layer-wise decay factor of 0.9. The joint model (upper bound) is trained for 12500 iterations on all data i.e., Image Net-1K and Places365-LT combined using an initial learning rate of 10 4 without a scheduler. For all experiments, we set the rank of the Lo RA weight matrices to 48. In all cases, all metrics are based on Top-1 accuracy (%).