SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
Authors: Yichen Wu, Hongming Piao, Long-Kai Huang, Renzhen Wang, Wanhua Li, Hanspeter Pfister, Deyu Meng, Kede Ma, Ying Wei
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical and theoretical analysis reveals that SD-Lo RA tends to follow a low-loss trajectory and converges to an overlapping low-loss region for all learned tasks, resulting in an excellent stability-plasticity trade-off. Building upon these insights, we introduce two variants of SD-Lo RA with further improved parameter efficiency. All parameters of SD-Lo RAs can be end-to-end optimized for CL objectives. Meanwhile, they support efficient inference by allowing direct evaluation with the finally trained model, obviating the need for component selection. Extensive experiments across multiple CL benchmarks and foundation models consistently validate the effectiveness of SD-Lo RA. |
| Researcher Affiliation | Collaboration | 1City University of Hong Kong, 2Harvard University, 3Xi an Jiaotong University, 4Tencent AI Lab, 5Zhejiang University, 6Pengcheng Laboratory |
| Pseudocode | Yes | Algorithm 1 SD-Lo RA and its Variants on the Current Task Tt |
| Open Source Code | Yes | The code is available at https://github.com/Wu Yichen-97/SD-Lora-CL. |
| Open Datasets | Yes | Following (Gao et al., 2023; Liang & Li, 2024), we evaluate SD-Lo RAs on three standard CL benchmarks: Image Net-R (Boschini et al., 2022), Image Net A (Hendrycks et al., 2021), and Domain Net (Peng et al., 2019). Additionally, we include CIFAR100 (Krizhevsky, 2009) and CUB200 (Wah et al., 2011) results in Appendix A.3. |
| Dataset Splits | Yes | Specifically, Image Net-R consists of 200 Image Net classes (Deng et al., 2009) rendered in artistic styles. Image Net-A features 200 classes with natural adversarial examples, often misclassified by standard Image Net-trained models. Domain Net includes 345 classes across six distinct domains. As common practices (Liang & Li, 2024; Huang et al., 2024), we split Image Net-R into 5/10/20 tasks (40/20/10 classes per task), Image Net-A into 10 tasks (20 classes each), and Domain Net into 5 tasks (69 classes each). Additionally, we include CIFAR100 (Krizhevsky, 2009) and CUB200 (Wah et al., 2011) results in Appendix A.3. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running its experiments. It only mentions using 'Vi T-B/16' as the foundation model. |
| Software Dependencies | No | The paper mentions 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify any software libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For all methods, training is carried out by Adam (Kingma & Ba, 2014) with a learning rate of 0.008 and a minibatch size of 128 for 30 epochs on Image Net-R, 10 epochs on Domain Net, and 20 epochs on all other datasets. We report mean results across five runs with standard errors. The SD-Lo RA components are inserted into the attention layers of all Transformer blocks, modifying the query and value projections, with a fixed rank of r1 = 10. For SD-Lo RA-RR, we set the additional rank parameters as µ = 4, ν = 8, rµ = 8, and rν = 6. For SD-Lo RA-KD, we set the threshold for the fitting residual to τ = 9 10 4. |