reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continual Learning Using a Kernel-Based Method Over Foundation Models

Authors: Saleh Momeni, Sahisnu Mazumder, Bing Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation using text and image classification datasets demonstrates that KLDA significantly outperforms baselines.
Researcher Affiliation	Collaboration	Saleh Momeni1, Sahisnu Mazumder2, Bing Liu1 1 Department of Computer Science, University of Illinois Chicago, USA 2 Intel Labs, USA EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: KLDA Training
Open Source Code	Yes	Code https://github.com/salehmomeni/klda
Open Datasets	Yes	We conduct experiments on both text and image classification datasets to evaluate our proposed method. For our main experiments, we use the following four text classification datasets: CLINC: It has 150 classes of dialogue intents from many different application domains (Larson et al. 2019). Banking: It has 77 classes of dialogue intents in the banking domain (Casanueva et al. 2020). DBpedia: A text classification dataset of Wikipedia articles with 70 classes (Liu et al. 2021b). HWU: Another dialogue intent classification dataset featuring 20 domains with 64 classes (Auer et al. 2007). Image Datasets: We also evaluate KLDA using four image classification datasets: CIFAR10 and CIFAR100 with 10 and 100 classes respectively (Krizhevsky, Hinton et al. 2009), Tiny Image Net with 200 classes (Le and Yang 2015), and Stanford Cars with 196 classes (Yang et al. 2015), applying their official train/test splits.
Dataset Splits	Yes	CLINC: We used the train/test split of 10,000/750 samples, and the classes were randomly divided into 10 disjoint tasks. Banking: We used a 10,000/1,000 train/test split and divided the classes into 7 disjoint tasks. DBpedia: We used a train/test split of 10,000/1,000 samples and divided the classes into 7 disjoint tasks. HWU: We used a train/test split of 9,000/1,000 samples and partitioned the classes into 8 disjoint tasks. Image Datasets: We also evaluate KLDA using four image classification datasets: CIFAR10 and CIFAR100 with 10 and 100 classes respectively (Krizhevsky, Hinton et al. 2009), Tiny Image Net with 200 classes (Le and Yang 2015), and Stanford Cars with 196 classes (Yang et al. 2015), applying their official train/test splits.
Hardware Specification	Yes	All experiments were conducted on a single NVIDIA A100 GPU with 80GB of VRAM.
Software Dependencies	No	Our implementation is built using Py Torch, with all pretrained models sourced from the Hugging Face Transformers library. Specific version numbers for PyTorch or Hugging Face Transformers are not provided.
Experiment Setup	Yes	The Joint Fine-tuning method, representing the upper bound, is trained for 50 epochs with a batch size of 128, using the Adam optimizer with a learning rate of 1e-3 for the classifier head and 1e-4 for the FM parameters. For our ensemble approach KLDA-E, we use a set of 5 models. KLDA has two hyperparameters itself: the transformation dimension D and the RFF σ. ... we found that setting D to 5000 provides a good balance... The σ parameter is also empirically determined within range [10 2, 10 6] for each FM.