SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

Authors: Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, Ying Wei

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical and theoretical analysis reveals that SD-Lo RA tends to follow a low-loss trajectory and converges to an overlapping low-loss region for all learned tasks, resulting in an excellent stability-plasticity trade-off. Building upon these insights, we introduce two variants of SD-Lo RA with further improved parameter efficiency. All parameters of SD-Lo RAs can be end-to-end optimized for CL objectives. Meanwhile, they support efficient inference by allowing direct evaluation with the finally trained model, obviating the need for component selection. Extensive experiments across multiple CL benchmarks and foundation models consistently validate the effectiveness of SD-Lo RA.
Researcher Affiliation Collaboration 1City University of Hong Kong, 2Harvard University, 3Xi an Jiaotong University, 4Tencent AI Lab, 5Zhejiang University, 6Pengcheng Laboratory
Pseudocode Yes Algorithm 1 SD-Lo RA and its Variants on the Current Task Tt
Open Source Code Yes The code is available at https://github.com/Wu Yichen-97/SD-Lora-CL.
Open Datasets Yes Following (Gao et al., 2023; Liang & Li, 2024), we evaluate SD-Lo RAs on three standard CL benchmarks: Image Net-R (Boschini et al., 2022), Image Net A (Hendrycks et al., 2021), and Domain Net (Peng et al., 2019). Additionally, we include CIFAR100 (Krizhevsky, 2009) and CUB200 (Wah et al., 2011) results in Appendix A.3.
Dataset Splits Yes Specifically, Image Net-R consists of 200 Image Net classes (Deng et al., 2009) rendered in artistic styles. Image Net-A features 200 classes with natural adversarial examples, often misclassified by standard Image Net-trained models. Domain Net includes 345 classes across six distinct domains. As common practices (Liang & Li, 2024; Huang et al., 2024), we split Image Net-R into 5/10/20 tasks (40/20/10 classes per task), Image Net-A into 10 tasks (20 classes each), and Domain Net into 5 tasks (69 classes each). Additionally, we include CIFAR100 (Krizhevsky, 2009) and CUB200 (Wah et al., 2011) results in Appendix A.3.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running its experiments. It only mentions using 'Vi T-B/16' as the foundation model.
Software Dependencies No The paper mentions 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify any software libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For all methods, training is carried out by Adam (Kingma & Ba, 2014) with a learning rate of 0.008 and a minibatch size of 128 for 30 epochs on Image Net-R, 10 epochs on Domain Net, and 20 epochs on all other datasets. We report mean results across five runs with standard errors. The SD-Lo RA components are inserted into the attention layers of all Transformer blocks, modifying the query and value projections, with a fixed rank of r1 = 10. For SD-Lo RA-RR, we set the additional rank parameters as µ = 4, ν = 8, rµ = 8, and rν = 6. For SD-Lo RA-KD, we set the threshold for the fitting residual to τ = 9 10 4.