Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning

Authors: Mohammadamin Banayeeanzade, Mahdi Soltanolkotabi, Mohammad Rostami

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, through extensive empirical evaluations, we demonstrate that our theoretical findings are also applicable to deep neural networks, offering valuable guidance for designing MTL and CL models in practice.
Researcher Affiliation Academia Amin Banayeeanzade EMAIL Department of Computer Science University of Southern California Mahdi Soltanolkotabi EMAIL Department of Electrical and Computer Engineering University of Southern California Mohammad Rostami EMAIL Department of Computer Science University of Southern California
Pseudocode No The paper includes mathematical formulations and derivations but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/aminbana/MTL-Theory .
Open Datasets Yes We empirically show that our findings for the linear models are generalizable to DNNs by conducting various experiments using different architectures and practical datasets such as CIFAR-100, Imagenet-R, and CUB-200. ... We use CIFAR-100, Image Net-R (IN-R), and CUB-200 datasets in our experiments.
Dataset Splits Yes We generate MTL or CL tasks by randomly splitting these datasets into 10 tasks, with an equal number of classes in each task (van de Ven et al., 2022). ... The sample budget is linearly proportional to the number of training images stored in the memory, where 100% corresponds to storing 10% of the whole training set.
Hardware Specification Yes All experiments in the main text are reproducible on a single NVIDIA 2080 TI Ge Force GPU.
Software Dependencies No We mainly used Py Torch (Paszke et al., 2019) for our implementations and datasets are freely accessible... The paper mentions PyTorch but does not provide a specific version number for it or other key software components like Python or CUDA.
Experiment Setup Yes We utilized SGD optimizer with a learning rate of 0.01, Nestrov, and 0.95 momentum. We trained all our models for 100 epochs.