Learn more, but bother less: parameter efficient continual learning

Authors: Fuli Qiao, Mehrdad Mahdavi

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on continual learning benchmarks validate the efficacy of our proposed method, which outperforms existing state-of-the-art methods in reducing forgetting, enhancing task performance, and preserving the model s ability to generalize to unseen tasks.
Researcher Affiliation Academia Fuli Qiao Pennsylvania State University EMAIL Mehrdad Mahdavi Pennsylvania State University EMAIL
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks. Figure 2 is a framework overview diagram, not pseudocode.
Open Source Code No The NeurIPS checklist states: "Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We use open-source datasets and models, but do not attach the code."
Open Datasets Yes We evaluate our approach using a CL benchmark specifically designed for language models. This benchmark comprises five text classification datasets: AG News, Amazon Reviews, Yelp Reviews, DBpedia, and Yahoo Answers, as introduced by [51]... Table 6: The details of 15 datasets utilized in our continual learning (CL) experiments, including the evaluation metrics used for assessment. Our selection encompasses datasets from established benchmarks:: the standard CL benchmark [51], GLUE [42], and Super GLUE benchmarks [41], and added IMDB movie reviews dataset.
Dataset Splits Yes For each task, we train using 1000 randomly selected samples and validate using 500 samples per class, following the methodology of [35].
Hardware Specification Yes All our experiments involving T5 models were performed on a server outfitted with four NVIDIA A6000 GPUs, utilizing the Deep Speed repository for implementation.
Software Dependencies No The paper mentions "utilizing the Deep Speed repository" but does not specify a version number for Deep Speed or any other software dependencies.
Experiment Setup Yes For every sequence of tasks across different orders, we standardized our experimental setup as follows: A constant rate of 1e-3 was maintained throughout the experiments. We used a total batch size of 32, distributed as 8 per GPU to leverage the computational capabilities of all four A6000 GPUs efficiently. We set the dropout rate at 0.1. We applied a regularization rate of 0.1 to the orthogonal matrices derived from the Singular Value Decomposition (SVD). A rate of 0.0 was employed, indicating no additional penalty on the model s weights during training.