LADA: Scalable Label-Specific CLIP Adapter for Continual Learning

Authors: Mao-Lin Luo, Zi-Hao Zhou, Tong Wei, Min-Ling Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive results show that LADA achieves stateof-the-art performance in continual learning settings. The implementation code is available at https://github.com/Maolin Luo/LADA. ... We evaluate our method on both 16-shot and full-shot settings.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China. Correspondence to: Tong Wei <EMAIL>.
Pseudocode No The paper describes methods using mathematical formulations (Eq. 1-10) and prose, but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation code is available at https://github.com/Maolin Luo/LADA.
Open Datasets Yes We conduct experiments on the recently proposed X-TAIL (Xu et al., 2024) benchmark which consists of 10 image classification datasets: Aircraft (Maji et al., 2013), Caltech101 (Fei-Fei et al., 2004), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2019), Flowers (Nilsback & Zisserman, 2008), Food (Bossard et al., 2014), MNIST (Deng, 2012), Oxford Pet (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), and SUN397 (Xiao et al., 2010).
Dataset Splits Yes In addition to the 16-shot setting proposed by (Xu et al., 2024), in which 16 training samples per class were selected for each task, we also evaluate the benchmark under a fullshot setting. This more realistic scenario maintains the original dataset distribution, with varying numbers of training samples across tasks, providing a more comprehensive and challenging evaluation for continual learning methods.
Hardware Specification Yes All experiments of LADA are conducted on a single NVIDIA 4090 GPU.
Software Dependencies No The paper mentions the use of CLIP, ViT-B/16, AdaptFormer, and AdamW optimizer, but does not specify version numbers for these or any other software dependencies such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes The training process is carried out using the Adam W (Loshchilov & Hutter, 2019) optimizer, with a learning rate of 0.001 and a batch size of 64 across all tasks. For the primary experiments, we set the hyperparameters as λ1 = 16 and λ2 = 4.