Learning without Isolation: Pathway Protection for Continual Learning
Authors: Zhikang Chen, Abudukelimu Wuerkaixi, Sen Cui, Haoxuan Li, Ding Li, Jingfeng Zhang, Bo Han, Gang Niu, Houfang Liu, Yi Yang, Sifan Yang, Changshui Zhang, Tianling Ren
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on popular benchmark datasets demonstrate the superiority of the proposed Lw I. Our experiments on both CIFAR-100 and Tiny Imagenet datasets demonstrate that our framework outperforms other methods. In the main text, we present the results of three experiments, including the application of the Res Net32 architecture to the CIFAR-100 dataset, and Res Net18 to both the CIFAR-100 and Tiny-Image Net datasets. Table 1. Task-agnostic and Task-aware accuracy (%) of different methods. Our approach is based on data-free, but the results of exemplarbased methods are also provided. Table 2. Task-agnostic accuracy (%) of methods on using minimum similarity matching on different layers. Table 3. Task-aware accuracy (%) of methods on using minimum similarity matching on different layers. 5.5. Ablation Studies To validate the effectiveness of different modules in our proposed method, Lw I, we conducted ablation experiments on the model. |
| Researcher Affiliation | Academia | 1Tsinghua University, Beijing, P.R.China 2RIKEN 3Peking University 4The University of Auckland 5Hong Kong Baptist University. Correspondence to: Yi Yang <EMAIL>, Changshui Zhang <EMAIL>, Tianling Ren <Ren EMAIL>. |
| Pseudocode | Yes | The overall process of our proposed method is illustrated in Algorithm 3 in appendix. Algorithm 1 Model Fusion Process Algorithm 2 Adaptive algorithm Algorithm 3 Lw I |
| Open Source Code | Yes | The source code of our framework is accessible at https://github. com/chenzk202212/Lw I. |
| Open Datasets | Yes | Experiments on popular benchmark datasets demonstrate the superiority of the proposed Lw I. Our experiments on both CIFAR-100 and Tiny Imagenet datasets demonstrate that our framework outperforms other methods. Datasets. Following the work (Masana et al., 2022), we evaluate different methods on benchmark datasets with settings, including CIFAR-100 and Tiny-Imagenet datasets. |
| Dataset Splits | Yes | Under the condition of continual learning, we use three task-splitting settings: 5 splits, 10 splits, and 20 splits. CIFAR-100: If set to 5 splits, it corresponds to 20 classes per head. If set to 10 splits, it corresponds to 10 classes per category. If set to 20 splits, it corresponds to 5 classes per category. Tiny-Imagenet: If set to 5 splits, it corresponds to 40 classes per head. If set to 10 splits, it corresponds to 20 classes per category. If set to 20 splits, it corresponds to 10 classes per category. CIFAR-100 dataset contains 100 classes, each of which contains 600 32*32 color pictures, 500 are for training, and 100 are for testing. The Tiny-Imagenet dataset contains 200 classes, each of which contains 500 64*64 color images, 400 images among which were used for training, 50 used for validation, and 50 for testing. |
| Hardware Specification | Yes | In the experiments, we conduct all methods on a local Linux server that has two physical CPU chips (Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz) and 32 logical kernels. All methods are implemented using Pytorch framework and all models are trained on Ge Force RTX 2080 Ti GPUs. |
| Software Dependencies | No | All methods are implemented using Pytorch framework and all models are trained on Ge Force RTX 2080 Ti GPUs. |
| Experiment Setup | Yes | Implementation Details. We trained the model for 200 epoches and optimized it in conjunction with SGD, setting the batch size of the dataset as 64. For rehearsal-based methods, we set 2000 exemplars using the herding method to select (Masana et al., 2022). During network training, the learning rate was initialized at 0.1. Furthermore, the learning rate was decreased by a factor of 0.1 in the 80th and 120th epochs, and the total number of training epochs was set to 200. The model architecture and training hyperparameters are the same for different methods. When employing Res Net32, the momentum for the SGD optimizer was set to 0.9, while, for Res Net18, the momentum for SGD optimizer was set to 0.0. |