SIESTA: Efficient Online Continual Learning with Sleep

Authors: Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Ronald Kemker, Christopher Kanan

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare SIESTA to a variety of baseline and state-of-the-art methods, including online learners REMIND (Hayes et al., 2020), ER (Chaudhry et al., 2019), SLDA (Hayes & Kanan, 2020), NCM (Mensink et al., 2013); incremental batch learners i Ca RL (Rebuffi et al., 2017), Bi C (Wu et al., 2019), End-to-End (Castro et al., 2018), WA (Zhao et al., 2020), PODNet (Douillard et al., 2020), Simple-DER (Li et al., 2021), DER (Yan et al., 2021), MEMO (Zhou et al., 2023b), FOSTER (Wang et al., 2022), Dy Tox (Douillard et al., 2022); and an offline learner.
Researcher Affiliation Academia Md Yousuf Harun EMAIL Rochester Institute of Technology Jhair Gallardo EMAIL Rochester Institute of Technology Tyler L. Hayes EMAIL Rochester Institute of Technology Ronald Kemker EMAIL United States Space Force Christopher Kanan EMAIL University of Rochester
Pseudocode Yes Figure 3: A high-level overview of SIESTA. During the Wake Phase, it transforms raw inputs into intermediate feature representations using network H. The inputs are then compressed with tensor quantization and cached. Then, weights belonging to recently seen classes in network F are updated with a running class mean using the output vectors from G. Finally, inference is performed on the current sample. During the Sleep Phase, a sampler uses a rehearsal policy to choose which examples should be reconstructed from the cached data for each mini-batch. Then, networks G and F are updated with backpropagation in a supervised manner. The wake/sleep cycles alternate.
Open Source Code Yes 1Code is available at https://yousuf907.github.io/siestasite
Open Datasets Yes We use four datasets in our experiments. For our main results, we use Image Net ILSVRC-2012 (Russakovsky et al., 2015), which is the standard object recognition benchmark for testing a model s ability to scale. It has 1.28 million images uniformly distributed across 1000 categories. We study another large-scale scene recognition dataset, Places365-Standard which is a subset of Places-2 Dataset (Zhou et al., 2017). We also use a long-tailed version of Places-2 Dataset, Places365-LT to evaluate rehearsal policies (see Appendix D for details). Additionally, we also evaluate performance on the CUB-200 (Wah et al., 2011) dataset in Appendix L.
Dataset Splits Yes Image Net ILSVRC-2012 (Russakovsky et al., 2015), which is the standard object recognition benchmark...It has 1.28 million images uniformly distributed across 1000 categories. Places365-LT ... has a total of 365 categories with 62500 training images raging from 5 to 4980 images per class. We use the Places365-LT validation set from Liu et al. (2019) as our test set consisting of a total of 7300 images with a balanced distribution of 20 images per class. For CUB-200 dataset, ...each model learns 200 classes in 4 steps where each step consists of 50 classes.
Hardware Specification Yes Without augmentations, training SIESTA requires only 1.9 hours on a single NVIDIA A5000 GPU.
Software Dependencies No SIESTA is trained with cross-entropy loss and uses SGD as its optimizer with the One Cycle learning rate (LR) scheduler (Smith & Topin, 2019) during each sleep phase... Following REMIND, we exclusively use reconstructed versions of the output of H during continual learning. The remaining 11 layers of Mobile Net V3-L (97.81% of the DNN parameters) are trained during sleep.
Experiment Setup Yes For each sleep cycle, we use a batch size of 64, momentum 0.9, weight decay 1e-5, and an initial LR of 0.2 for the last layer. LR is reduced in earlier layers by a layer-wise decay factor of 0.99. SIESTA uses the same universal setting i.e., same hyperparameters and same network configuration for all learning phases.