TinySubNets: An Efficient and Low Capacity Continual Learning Strategy

Authors: Marcin Pietron, Kamil Faber, Dominik Żurek, Roberto Corizzo

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results involving common benchmark CL datasets and scenarios show that our proposed strategy achieves better results in terms of accuracy than existing state-of-the-art CL strategies.
Researcher Affiliation Academia 1AGH University of Krakow, Krakow, Poland 2American University, Washington DC, USA EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Tiny Sub Networks (TSN) main algorithm Algorithm 2: Fine tuned pruning for task t
Open Source Code Yes Code https://github.com/lifelonglab/tinysubnets The code of the method is available at the following public repository: https://github.com/lifelonglab/tinysubnets.
Open Datasets Yes Our main experiments were performed with three popular and commonly adopted CL scenarios: Permuted MNIST (p-MNIST) (Cun 1998), split CIFAR100 (s CIFAR100) (Krizhevsky 2009), and 5 datasets (Ebrahimi et al. 2020), a task-incremental scenario consisting of MNIST, SVHN, Fashion MNIST, CIFAR10, not-MNIST, Tiny Imagenet (Le and Yang 2015), and Imagenet100 (Russakovsky et al. 2015).
Dataset Splits No The paper describes how tasks are split (e.g., 'p-MNIST scenario consists of 10 tasks with randomly permuted pixels', 's-CIFAR100 is divided into 10 tasks with 10 classes each'), and mentions using a 'validation set' in Algorithm 2, but does not provide specific percentages or absolute sample counts for training/validation/test data splits within each task.
Hardware Specification Yes Experiments are executed on a workstation equipped with an NVIDIA A100 GPU.
Software Dependencies No The paper mentions the use of 'SGD optimizer' and 'ADAM optimizer' but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Hyperparameters: The hyperparameters used in our experiments were set up as follows: Adaptive learning rate for p-MNIST it starts from 3e 1 and ends in 1e 4, for CIFAR100 and 5 datasets it starts from 1e 2 and ends with 1e 4, for the rest of the scenarios is between 1e 3 and 1e 5 Number of epochs per task 200 epochs per task for each scenario Batch size CIFAR100 2048, Imagenet100 32, 5 datasets, p-MNIST and Tiny Imagenet 256 Fine tuning parameters 50 iterations for each scenario, α 0.95, β 0.95, Kullback-Leibler threshold set empirically to have max two memory banks Initial capacity per task 0.55 for CIFAR100, 0.5 for the rest of the scenarios Frequency (num. of batches) to trigger adaptive quantization three times per epoch for each scenario Task replay memory size 50 samples per task (only 15 samples per task for Tiny Imagenet), Quantity of sparsity change 0.01 for each continual learning benchmark The SGD optimizer was used for p-MNIST dataset. For the other scenarios the ADAM optimizer was adopted.