Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

STAR: Stability-Inducing Weight Perturbation for Continual Learning

Authors: Masih Eskandar, Tooba Imtiaz, Davin Hill, Zifeng Wang, Jennifer Dy

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that STAR consistently improves performance of existing methods by up to 15% across varying baselines, and achieves superior or competitive accuracy to that of stateof-the-art methods aimed at improving rehearsal-based continual learning. Our implementation is available at https://github.com/Gnomy17/STAR_CL.
Researcher Affiliation Collaboration Masih Eskandar1 , Tooba Imtiaz1, Davin Hill1, Zifeng Wang2 , Jennifer Dy1 1Department of Electrical & Computer Engineering, Northeastern University 2 Google Cloud AI Research Correspondence to EMAIL
Pseudocode Yes A detailed pseudocode of our training algorithm can be found in Algorithm 1.
Open Source Code Yes Our implementation is available at https://github.com/Gnomy17/STAR_CL.
Open Datasets Yes We evaluate STAR on three mainstream CL benchmark datasets. Split-CIFAR10 and Split-CIFAR100 are the CIFAR10/100 datasets (Krizhevsky et al., 2009), split into 5 disjoint tasks of 2 and 20 classes respectively. Split-mini Imagenet is a subsampled version of the Imagenet dataset (Deng et al., 2009), split into 20 disjoint tasks of 5 classes each.
Dataset Splits Yes Split-CIFAR10 and Split-CIFAR100 are the CIFAR10/100 datasets (Krizhevsky et al., 2009), split into 5 disjoint tasks of 2 and 20 classes respectively. Split-mini Imagenet is a subsampled version of the Imagenet dataset (Deng et al., 2009), split into 20 disjoint tasks of 5 classes each. [...] For each task i, we measure LF G(ฮธti, ฮธt) for t > ti on the test set for task i.
Hardware Specification Yes All experiments were run on a single Nvidia RTX A6000 GPU.
Software Dependencies No The paper does not provide specific software names with version numbers for its dependencies. It mentions using model architectures like ResNet18 and EfficientNet b2, but no specific software environment details with versions.
Experiment Setup Yes For CIFAR10-100 we use a batch size of 32 and 50 epochs per task. For mini Imagenet we use a batch size of 128 and 80 epochs per task. For a full list of hyperparameters, see appendix F. [...] We present our hyperparameters in table 8.