Continual Learning Using a Kernel-Based Method Over Foundation Models
Authors: Saleh Momeni, Sahisnu Mazumder, Bing Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation using text and image classification datasets demonstrates that KLDA significantly outperforms baselines. |
| Researcher Affiliation | Collaboration | Saleh Momeni1, Sahisnu Mazumder2, Bing Liu1 1 Department of Computer Science, University of Illinois Chicago, USA 2 Intel Labs, USA EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: KLDA Training |
| Open Source Code | Yes | Code https://github.com/salehmomeni/klda |
| Open Datasets | Yes | We conduct experiments on both text and image classification datasets to evaluate our proposed method. For our main experiments, we use the following four text classification datasets: CLINC: It has 150 classes of dialogue intents from many different application domains (Larson et al. 2019). Banking: It has 77 classes of dialogue intents in the banking domain (Casanueva et al. 2020). DBpedia: A text classification dataset of Wikipedia articles with 70 classes (Liu et al. 2021b). HWU: Another dialogue intent classification dataset featuring 20 domains with 64 classes (Auer et al. 2007). Image Datasets: We also evaluate KLDA using four image classification datasets: CIFAR10 and CIFAR100 with 10 and 100 classes respectively (Krizhevsky, Hinton et al. 2009), Tiny Image Net with 200 classes (Le and Yang 2015), and Stanford Cars with 196 classes (Yang et al. 2015), applying their official train/test splits. |
| Dataset Splits | Yes | CLINC: We used the train/test split of 10,000/750 samples, and the classes were randomly divided into 10 disjoint tasks. Banking: We used a 10,000/1,000 train/test split and divided the classes into 7 disjoint tasks. DBpedia: We used a train/test split of 10,000/1,000 samples and divided the classes into 7 disjoint tasks. HWU: We used a train/test split of 9,000/1,000 samples and partitioned the classes into 8 disjoint tasks. Image Datasets: We also evaluate KLDA using four image classification datasets: CIFAR10 and CIFAR100 with 10 and 100 classes respectively (Krizhevsky, Hinton et al. 2009), Tiny Image Net with 200 classes (Le and Yang 2015), and Stanford Cars with 196 classes (Yang et al. 2015), applying their official train/test splits. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA A100 GPU with 80GB of VRAM. |
| Software Dependencies | No | Our implementation is built using Py Torch, with all pretrained models sourced from the Hugging Face Transformers library. Specific version numbers for PyTorch or Hugging Face Transformers are not provided. |
| Experiment Setup | Yes | The Joint Fine-tuning method, representing the upper bound, is trained for 50 epochs with a batch size of 128, using the Adam optimizer with a learning rate of 1e-3 for the classifier head and 1e-4 for the FM parameters. For our ensemble approach KLDA-E, we use a set of 5 models. KLDA has two hyperparameters itself: the transformation dimension D and the RFF σ. ... we found that setting D to 5000 provides a good balance... The σ parameter is also empirically determined within range [10 2, 10 6] for each FM. |