BECAME: Bayesian Continual Learning with Adaptive Model Merging

Authors: Mei Li, Yuxiang Lu, Qinyan Dai, Suizhi Huang, Yue Ding, Hongtao Lu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. Extensive experiments show that our approach outperforms state-of-the-art CL methods and existing merging strategies. Code is available at https: //github.com/limei0818/BECAME. [...] Section 4. Experiments
Researcher Affiliation Academia Mei Li * 1 Yuxiang Lu * 1 Qinyan Dai 1 Suizhi Huang 1 Yue Ding 1 Hongtao Lu 1 [...] 1Shanghai Jiao Tong University. Correspondence to: Yue Ding <EMAIL>.
Pseudocode Yes Algorithm 1 Pseudo-codes for BECAME
Open Source Code Yes Code is available at https: //github.com/limei0818/BECAME.
Open Datasets Yes We conduct our experiments on four widelyused benchmarks: 20-Split CIFAR-100 (Krizhevsky et al., 2009), 10-Split CIFAR-100, 25-Split Tiny Image Net (Wu et al., 2017), and 20-Split Mini Image Net (Vinyals et al., 2016).
Dataset Splits Yes For GPM-based experiments, the dataset is split into 95% for training and 5% for validation, with no data augmentation applied across all three datasets. In NSCL-based experiments, the entire training dataset is utilized, with data augmentation applied via a random crop with 4-pixel padding and a random horizontal flip. [...] Table 5. Dataset statistics for GPM-based experiments. [...] Table 6. Dataset statistics for NSCL-based experiments.
Hardware Specification Yes All experiments are performed on a single NVIDIA Ge Force RTX 4080 GPU.
Software Dependencies No The paper mentions using Adam optimizer and EWC for regularization, but does not specify any version numbers for these or other software libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Implementation Details. For the first task, the training process is identical to that of the corresponding baselines. For subsequent tasks t {2, 3, , T}, our method involves two stages as mentioned above. All experiments are repeated with 5 random seeds, and we report the mean and standard deviation of the results. The hyperparameter configurations in both stages are mostly consistent with those of the baselines except subtle adjustments for adapting our methods, with details provided in Appendix B.4 to ensure reproducibility. [...] Hyperparameter settings for each GPM-based baseline. indicates that the parameter value is sourced from the corresponding papers or supplementary materials, while other values are derived from the code. lr 0.01 0.01 0.05 0.05 0.1 0.1 0.1 0.1 lr min 10 5 10 5 5 10 5 5 10 5 10 3 10 3 10 3 10 3 lr patience 6 6 6 6 5 5 5 5 lr factor 2 2 2 2 3 3 3 3 n epochs 200 200 200 200 10 100 10 200 batchsize 64 64 64 64 10 64 10 64 ϵ0 0.97 0.97 0.97 0.97 0.985 0.985 0.985 0.98 ϵ 3 10 3 3 10 3 3 10 3 3 10 3 3 10 4 3 10 4 3 10 4 10 3 α 10 5 1 3