reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learn more, but bother less: parameter efficient continual learning

Authors: Fuli Qiao, Mehrdad Mahdavi

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on continual learning benchmarks validate the efficacy of our proposed method, which outperforms existing state-of-the-art methods in reducing forgetting, enhancing task performance, and preserving the model s ability to generalize to unseen tasks.
Researcher Affiliation	Academia	Fuli Qiao Pennsylvania State University EMAIL Mehrdad Mahdavi Pennsylvania State University EMAIL
Pseudocode	No	The paper does not include pseudocode or clearly labeled algorithm blocks. Figure 2 is a framework overview diagram, not pseudocode.
Open Source Code	No	The NeurIPS checklist states: "Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We use open-source datasets and models, but do not attach the code."
Open Datasets	Yes	We evaluate our approach using a CL benchmark specifically designed for language models. This benchmark comprises five text classification datasets: AG News, Amazon Reviews, Yelp Reviews, DBpedia, and Yahoo Answers, as introduced by [51]... Table 6: The details of 15 datasets utilized in our continual learning (CL) experiments, including the evaluation metrics used for assessment. Our selection encompasses datasets from established benchmarks:: the standard CL benchmark [51], GLUE [42], and Super GLUE benchmarks [41], and added IMDB movie reviews dataset.
Dataset Splits	Yes	For each task, we train using 1000 randomly selected samples and validate using 500 samples per class, following the methodology of [35].
Hardware Specification	Yes	All our experiments involving T5 models were performed on a server outfitted with four NVIDIA A6000 GPUs, utilizing the Deep Speed repository for implementation.
Software Dependencies	No	The paper mentions "utilizing the Deep Speed repository" but does not specify a version number for Deep Speed or any other software dependencies.
Experiment Setup	Yes	For every sequence of tasks across different orders, we standardized our experimental setup as follows: A constant rate of 1e-3 was maintained throughout the experiments. We used a total batch size of 32, distributed as 8 per GPU to leverage the computational capabilities of all four A6000 GPUs efficiently. We set the dropout rate at 0.1. We applied a regularization rate of 0.1 to the orthogonal matrices derived from the Singular Value Decomposition (SVD). A rate of 0.0 was employed, indicating no additional penalty on the model s weights during training.