IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning

Authors: Prashant Shivaram Bhat, Bharath Chennamkulam Renjith, Elahe Arani, Bahram Zonooz

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CL models under Class-Incremental Learning (Class-IL), Task-Incremental Learning (Task-IL), and Generalized Class-IL (GCIL) (Van de Ven & Tolias, 2019; Arani et al., 2022). More information on the datasets, task partition, and the corresponding network architecture used in these scenarios can be found in Appendix A. To provide a comprehensive analysis, we compare IMEX-Reg with several approaches that aim to improve generalization under low buffer regimes in CL. We consider ER-ACE, GCR, DRI (aimed at improving generalization), DER++, CLS-ER (use explicit consistency regularization by leveraging soft targets), Co2L and OCDNet (representation and/or auxiliary CRL) as our baselines. Furthermore, we provide a lower bound SGD , without using any mechanism to minimize catastrophic forgetting, and an upper bound Joint , where the training is carried out using the entire dataset. We report the average accuracy along with the standard deviation on all tasks after CL training with three random seeds. We also provide the results of the forgetting analysis.
Researcher Affiliation Collaboration Prashant Bhat1, , Bharath Renjith1, , Elahe Arani1,2, , Bahram Zonooz1, 1Eindhoven University of Technology (TU/e) 2Wayve EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Proposed Method: IMEX-Reg 1: Input: Data streams Dt, Model Φθ = {f, g, gmlp, h}, Hyperparameters α, β and λ, Memory buffer Dm {} 2: for all tasks t {1, 2, .., T} do 3: for all Iterations e {1, 2, .., E} do 4: L = 0 5: Sample a minibatch (Xt, Yt) Dt 6: Ft = f(Xt) 7: ˆYt, Zt, Ct = g(Ft), h(Ft), gmlp(g(Ft)) 8: if Dm = then 9: Sample a minibatch (Xm, Ym) Dm 10: Fm = f(Xm) 11: ˆY , Z, C = g(Fm), h(Fm), gmlp(g(Fm)) 12: Fe = fe(Xm) 13: ˆYe, Ze, Ce = ge(Fm), he(Fm), gmlpe(ge(Fm)) 14: L += λ [Lg cr + Lh cr] Equation 4 15: L += Ler + α Lrep + β Lecr Equations 1, 2, and 6 16: Update Φθ and Dm 17: Update θema Equation 3 18: return model Φθ
Open Source Code Yes 1Code is available at: https://github.com/NeurAI-Lab/IMEX-Reg.
Open Datasets Yes For Task-IL and Class-IL scenarios, we obtain Seq-CIFAR10, Seq-CIFAR100, and Seq-Tiny Image Net by splitting CIFAR10, CIFAR100, and Tiny Image Net into 5, 5, and 10 tasks of 2, 20, and 20 classes, respectively. The three datasets present progressively challenging scenarios (increasing the number of tasks or number of classes per task) for a comprehensive analysis of different CL methods. Generalized Class-IL (GCIL) (Mi et al., 2020) exposes the model to a more challenging scenario by utilizing probabilistic distributions to sample data from the CIFAR100 dataset in each task. The CIFAR100 dataset is split into 20 tasks, with each task containing 1000 samples with a maximum of 50 classes. GCIL provides two variations for sample distribution, Uniform and Longtail (class imbalance). GCIL is the most realistic scenario with varying numbers of classes per task and classes reappearing with different sample sizes.
Dataset Splits Yes For Task-IL and Class-IL scenarios, we obtain Seq-CIFAR10, Seq-CIFAR100, and Seq-Tiny Image Net by splitting CIFAR10, CIFAR100, and Tiny Image Net into 5, 5, and 10 tasks of 2, 20, and 20 classes, respectively. The CIFAR100 dataset is split into 20 tasks, with each task containing 1000 samples with a maximum of 50 classes. GCIL provides two variations for sample distribution, Uniform and Longtail (class imbalance).
Hardware Specification Yes We trained all our models on NVIDIA s Ge Force RTX 2080 Ti (11GB).
Software Dependencies No We build on top of the Mammoth (Buzzega et al., 2020) CL repository in Py Torch. We use the same backbone as recent approaches in CL (Buzzega et al., 2020; Arani et al., 2022), i.e., a Res Net18 backbone without pretraining for all experiments.
Experiment Setup Yes We use the same backbone as recent approaches in CL (Buzzega et al., 2020; Arani et al., 2022), i.e., a Res Net18 backbone without pretraining for all experiments. We use a linear layer, 3-layer MLP with Batch Norm and Re Lu, and 2-layer MLP for the classifier, projection head, and classifier projection, respectively. To ensure uniform experimental settings, we extended the Mammoth framework (Buzzega et al., 2020) and followed the same training scheme such as the SGD optimizer, batch size, number of training epochs, and learning rate for all experiments, unless otherwise specified. We employ random horizontal flip and random crop augmentations for supervised learning in Seq-CIFAR10, Seq-CIFAR100, Seq-Tiny Image Net, and GCIL-CIFAR100 experiments. For contrastive representation learning in the projection head, we transform the input batch using a stochastic augmentation module consisting of random resized crop, random horizontal flip followed by random color distortions. We trained all our models on NVIDIA s Ge Force RTX 2080 Ti (11GB). On average, it took around 2 hours to train IMEX-Reg on Seq-CIFAR10 and Seq-CIFAR100, and approximately 8 hours to train on Seq-Tiny Image Net. Table 6 provides the best hyperparameters used to report the results in Table 1. In addition to these hyperparameters, we use a standard batch size of 32 and a minibatch size of 32 for all our experiments.