Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models
Authors: Shuvendu Roy, Elham Dolatabadi, Arash Afkanpour, Ali Etemad
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed solution on Few-Shot Class-Incremental Learning (FSCIL) as well as a new and more challenging setup called Few-Shot Class-Incremental Tuning (FSCIT)... We conduct extensive evaluations across 16 diverse datasets, demonstrating the effectiveness of Co ACT in both FSCIL and FSCIT setups. Co ACT outperforms existing methods by up to 5.02% in FSCIL and up to 12.51% in FSCIT for individual datasets, with an average improvement of 2.47%. Furthermore, Co ACT exhibits reduced forgetting and enhanced robustness in low-shot experiments. Detailed ablation and sensitivity studies highlight the contribution of each component of Co ACT. |
| Researcher Affiliation | Collaboration | Shuvendu Roy 1,2, Elham Dolatabadi1,3, Arash Afkanpour1, Ali Etemad2 1Vector Institute, 2Queen s University, Canada, 3York University, Canada |
| Pseudocode | No | The paper describes methods and processes in narrative text and uses diagrams (e.g., Figure 2) to illustrate the architecture, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | Yes | We make our code publicly available at https://github.com/Shuvendu Roy/Co ACT-FSCIL. |
| Open Datasets | Yes | Following existing literature on FSCIL, we evaluate Co ACT on CIFAR-100 (Krizhevsky et al., 2009), CUB-200 (Wah et al., 2011), and mini Image Net (Russakovsky et al., 2015) datasets. We evaluate our new benchmark, FSCIT on a diverse set of 16 datasets, including generic object detection (Caltech101 (Fei-Fei et al., 2004), CIFAR-100 (Krizhevsky et al., 2009), CUB-200 (Wah et al., 2011), mini Image Net (Russakovsky et al., 2015), VOC 2007 (Everingham, 2008)), fine-grained recognition (Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flower102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013)), scene recognition (SUN397 (Xiao et al., 2010), Country211 (Radford et al., 2021)), satellite-image (Euro SAT (Helber et al., 2019), Resisc-45 (Cheng et al., 2017)), texture recognition (DTD (Cimpoi et al., 2014)), and traffic sign recognition (GTSRB (Houben et al., 2013)). |
| Dataset Splits | Yes | By default, we divide the classes into 10 (or 9) sessions with an equal number of classes and perform 10-shot continual learning... Table 5: Details on class splits over the continual sessions for different datasets... Specifically, we explore fine-tuning the foundation model in the FSCIT setup with only 1, 2, 4, 8, and 16 samples per class. |
| Hardware Specification | Yes | All experiments are conducted on an Nvidia V100 GPU |
| Software Dependencies | No | The paper states 'We implement our framework in Py Torch' but does not provide specific version numbers for PyTorch or any other software libraries or dependencies. |
| Experiment Setup | Yes | We implement our framework in Py Torch and train the model using an SGD optimizer with a momentum of 0.9. The base learning rate is set to 0.1, with a batch size of 64, and the model is trained for 50 epochs for the first session and 5 epochs for the remaining sessions. A cosine LR decay scheduler is used to reduce the learning rate over the training epochs. The teacher encoder is updated with a momentum value of 0.999. For experiments with the FSCIL setup, we train the model for 25 epochs with a learning rate of 0.001. We train the model with input resolution of 224 224. All other implementation details are the same as described above. All experiments are conducted with 3 random seeds, and the reported results are averaged over the three runs. |