reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SIESTA: Efficient Online Continual Learning with Sleep

Authors: Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Ronald Kemker, Christopher Kanan

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare SIESTA to a variety of baseline and state-of-the-art methods, including online learners REMIND (Hayes et al., 2020), ER (Chaudhry et al., 2019), SLDA (Hayes & Kanan, 2020), NCM (Mensink et al., 2013); incremental batch learners i Ca RL (Rebuffi et al., 2017), Bi C (Wu et al., 2019), End-to-End (Castro et al., 2018), WA (Zhao et al., 2020), PODNet (Douillard et al., 2020), Simple-DER (Li et al., 2021), DER (Yan et al., 2021), MEMO (Zhou et al., 2023b), FOSTER (Wang et al., 2022), Dy Tox (Douillard et al., 2022); and an offline learner.
Researcher Affiliation	Academia	Md Yousuf Harun EMAIL Rochester Institute of Technology Jhair Gallardo EMAIL Rochester Institute of Technology Tyler L. Hayes EMAIL Rochester Institute of Technology Ronald Kemker EMAIL United States Space Force Christopher Kanan EMAIL University of Rochester
Pseudocode	Yes	Figure 3: A high-level overview of SIESTA. During the Wake Phase, it transforms raw inputs into intermediate feature representations using network H. The inputs are then compressed with tensor quantization and cached. Then, weights belonging to recently seen classes in network F are updated with a running class mean using the output vectors from G. Finally, inference is performed on the current sample. During the Sleep Phase, a sampler uses a rehearsal policy to choose which examples should be reconstructed from the cached data for each mini-batch. Then, networks G and F are updated with backpropagation in a supervised manner. The wake/sleep cycles alternate.
Open Source Code	Yes	1Code is available at https://yousuf907.github.io/siestasite
Open Datasets	Yes	We use four datasets in our experiments. For our main results, we use Image Net ILSVRC-2012 (Russakovsky et al., 2015), which is the standard object recognition benchmark for testing a model s ability to scale. It has 1.28 million images uniformly distributed across 1000 categories. We study another large-scale scene recognition dataset, Places365-Standard which is a subset of Places-2 Dataset (Zhou et al., 2017). We also use a long-tailed version of Places-2 Dataset, Places365-LT to evaluate rehearsal policies (see Appendix D for details). Additionally, we also evaluate performance on the CUB-200 (Wah et al., 2011) dataset in Appendix L.
Dataset Splits	Yes	Image Net ILSVRC-2012 (Russakovsky et al., 2015), which is the standard object recognition benchmark...It has 1.28 million images uniformly distributed across 1000 categories. Places365-LT ... has a total of 365 categories with 62500 training images raging from 5 to 4980 images per class. We use the Places365-LT validation set from Liu et al. (2019) as our test set consisting of a total of 7300 images with a balanced distribution of 20 images per class. For CUB-200 dataset, ...each model learns 200 classes in 4 steps where each step consists of 50 classes.
Hardware Specification	Yes	Without augmentations, training SIESTA requires only 1.9 hours on a single NVIDIA A5000 GPU.
Software Dependencies	No	SIESTA is trained with cross-entropy loss and uses SGD as its optimizer with the One Cycle learning rate (LR) scheduler (Smith & Topin, 2019) during each sleep phase... Following REMIND, we exclusively use reconstructed versions of the output of H during continual learning. The remaining 11 layers of Mobile Net V3-L (97.81% of the DNN parameters) are trained during sleep.
Experiment Setup	Yes	For each sleep cycle, we use a batch size of 64, momentum 0.9, weight decay 1e-5, and an initial LR of 0.2 for the last layer. LR is reduced in earlier layers by a layer-wise decay factor of 0.99. SIESTA uses the same universal setting i.e., same hyperparameters and same network configuration for all learning phases.