reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

STAR: Stability-Inducing Weight Perturbation for Continual Learning

Authors: Masih Eskandar, Tooba Imtiaz, Davin Hill, Zifeng Wang, Jennifer Dy

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that STAR consistently improves performance of existing methods by up to 15% across varying baselines, and achieves superior or competitive accuracy to that of stateof-the-art methods aimed at improving rehearsal-based continual learning. Our implementation is available at https://github.com/Gnomy17/STAR_CL.
Researcher Affiliation	Collaboration	Masih Eskandar1 , Tooba Imtiaz1, Davin Hill1, Zifeng Wang2 , Jennifer Dy1 1Department of Electrical & Computer Engineering, Northeastern University 2 Google Cloud AI Research Correspondence to EMAIL
Pseudocode	Yes	A detailed pseudocode of our training algorithm can be found in Algorithm 1.
Open Source Code	Yes	Our implementation is available at https://github.com/Gnomy17/STAR_CL.
Open Datasets	Yes	We evaluate STAR on three mainstream CL benchmark datasets. Split-CIFAR10 and Split-CIFAR100 are the CIFAR10/100 datasets (Krizhevsky et al., 2009), split into 5 disjoint tasks of 2 and 20 classes respectively. Split-mini Imagenet is a subsampled version of the Imagenet dataset (Deng et al., 2009), split into 20 disjoint tasks of 5 classes each.
Dataset Splits	Yes	Split-CIFAR10 and Split-CIFAR100 are the CIFAR10/100 datasets (Krizhevsky et al., 2009), split into 5 disjoint tasks of 2 and 20 classes respectively. Split-mini Imagenet is a subsampled version of the Imagenet dataset (Deng et al., 2009), split into 20 disjoint tasks of 5 classes each. [...] For each task i, we measure LF G(θti, θt) for t > ti on the test set for task i.
Hardware Specification	Yes	All experiments were run on a single Nvidia RTX A6000 GPU.
Software Dependencies	No	The paper does not provide specific software names with version numbers for its dependencies. It mentions using model architectures like ResNet18 and EfficientNet b2, but no specific software environment details with versions.
Experiment Setup	Yes	For CIFAR10-100 we use a batch size of 32 and 50 epochs per task. For mini Imagenet we use a batch size of 128 and 80 epochs per task. For a full list of hyperparameters, see appendix F. [...] We present our hyperparameters in table 8.