reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models

Authors: Shuvendu Roy, Elham Dolatabadi, Arash Afkanpour, Ali Etemad

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposed solution on Few-Shot Class-Incremental Learning (FSCIL) as well as a new and more challenging setup called Few-Shot Class-Incremental Tuning (FSCIT)... We conduct extensive evaluations across 16 diverse datasets, demonstrating the effectiveness of Co ACT in both FSCIL and FSCIT setups. Co ACT outperforms existing methods by up to 5.02% in FSCIL and up to 12.51% in FSCIT for individual datasets, with an average improvement of 2.47%. Furthermore, Co ACT exhibits reduced forgetting and enhanced robustness in low-shot experiments. Detailed ablation and sensitivity studies highlight the contribution of each component of Co ACT.
Researcher Affiliation	Collaboration	Shuvendu Roy 1,2, Elham Dolatabadi1,3, Arash Afkanpour1, Ali Etemad2 1Vector Institute, 2Queen s University, Canada, 3York University, Canada
Pseudocode	No	The paper describes methods and processes in narrative text and uses diagrams (e.g., Figure 2) to illustrate the architecture, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	We make our code publicly available at https://github.com/Shuvendu Roy/Co ACT-FSCIL.
Open Datasets	Yes	Following existing literature on FSCIL, we evaluate Co ACT on CIFAR-100 (Krizhevsky et al., 2009), CUB-200 (Wah et al., 2011), and mini Image Net (Russakovsky et al., 2015) datasets. We evaluate our new benchmark, FSCIT on a diverse set of 16 datasets, including generic object detection (Caltech101 (Fei-Fei et al., 2004), CIFAR-100 (Krizhevsky et al., 2009), CUB-200 (Wah et al., 2011), mini Image Net (Russakovsky et al., 2015), VOC 2007 (Everingham, 2008)), fine-grained recognition (Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flower102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), FGVCAircraft (Maji et al., 2013)), scene recognition (SUN397 (Xiao et al., 2010), Country211 (Radford et al., 2021)), satellite-image (Euro SAT (Helber et al., 2019), Resisc-45 (Cheng et al., 2017)), texture recognition (DTD (Cimpoi et al., 2014)), and traffic sign recognition (GTSRB (Houben et al., 2013)).
Dataset Splits	Yes	By default, we divide the classes into 10 (or 9) sessions with an equal number of classes and perform 10-shot continual learning... Table 5: Details on class splits over the continual sessions for different datasets... Specifically, we explore fine-tuning the foundation model in the FSCIT setup with only 1, 2, 4, 8, and 16 samples per class.
Hardware Specification	Yes	All experiments are conducted on an Nvidia V100 GPU
Software Dependencies	No	The paper states 'We implement our framework in Py Torch' but does not provide specific version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup	Yes	We implement our framework in Py Torch and train the model using an SGD optimizer with a momentum of 0.9. The base learning rate is set to 0.1, with a batch size of 64, and the model is trained for 50 epochs for the first session and 5 epochs for the remaining sessions. A cosine LR decay scheduler is used to reduce the learning rate over the training epochs. The teacher encoder is updated with a momentum value of 0.999. For experiments with the FSCIL setup, we train the model for 25 epochs with a learning rate of 0.001. We train the model with input resolution of 224 224. All other implementation details are the same as described above. All experiments are conducted with 3 random seeds, and the reported results are averaged over the three runs.