BECAME: Bayesian Continual Learning with Adaptive Model Merging
Authors: Mei Li, Yuxiang Lu, Qinyan Dai, Suizhi Huang, Yue Ding, Hongtao Lu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. Extensive experiments show that our approach outperforms state-of-the-art CL methods and existing merging strategies. Code is available at https: //github.com/limei0818/BECAME. [...] Section 4. Experiments |
| Researcher Affiliation | Academia | Mei Li * 1 Yuxiang Lu * 1 Qinyan Dai 1 Suizhi Huang 1 Yue Ding 1 Hongtao Lu 1 [...] 1Shanghai Jiao Tong University. Correspondence to: Yue Ding <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Pseudo-codes for BECAME |
| Open Source Code | Yes | Code is available at https: //github.com/limei0818/BECAME. |
| Open Datasets | Yes | We conduct our experiments on four widelyused benchmarks: 20-Split CIFAR-100 (Krizhevsky et al., 2009), 10-Split CIFAR-100, 25-Split Tiny Image Net (Wu et al., 2017), and 20-Split Mini Image Net (Vinyals et al., 2016). |
| Dataset Splits | Yes | For GPM-based experiments, the dataset is split into 95% for training and 5% for validation, with no data augmentation applied across all three datasets. In NSCL-based experiments, the entire training dataset is utilized, with data augmentation applied via a random crop with 4-pixel padding and a random horizontal flip. [...] Table 5. Dataset statistics for GPM-based experiments. [...] Table 6. Dataset statistics for NSCL-based experiments. |
| Hardware Specification | Yes | All experiments are performed on a single NVIDIA Ge Force RTX 4080 GPU. |
| Software Dependencies | No | The paper mentions using Adam optimizer and EWC for regularization, but does not specify any version numbers for these or other software libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Implementation Details. For the first task, the training process is identical to that of the corresponding baselines. For subsequent tasks t {2, 3, , T}, our method involves two stages as mentioned above. All experiments are repeated with 5 random seeds, and we report the mean and standard deviation of the results. The hyperparameter configurations in both stages are mostly consistent with those of the baselines except subtle adjustments for adapting our methods, with details provided in Appendix B.4 to ensure reproducibility. [...] Hyperparameter settings for each GPM-based baseline. indicates that the parameter value is sourced from the corresponding papers or supplementary materials, while other values are derived from the code. lr 0.01 0.01 0.05 0.05 0.1 0.1 0.1 0.1 lr min 10 5 10 5 5 10 5 5 10 5 10 3 10 3 10 3 10 3 lr patience 6 6 6 6 5 5 5 5 lr factor 2 2 2 2 3 3 3 3 n epochs 200 200 200 200 10 100 10 200 batchsize 64 64 64 64 10 64 10 64 ϵ0 0.97 0.97 0.97 0.97 0.985 0.985 0.985 0.98 ϵ 3 10 3 3 10 3 3 10 3 3 10 3 3 10 4 3 10 4 3 10 4 10 3 α 10 5 1 3 |