CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

Authors: Shengzhuang Chen, Yikai Liao, Xiaoxiao Sun, Kede Ma, Ying Wei

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Leveraging CLDy B, we first conduct a joint evaluation of multiple state-of-the-art CL methods, leading to a set of commonly challenging and generalizable task sequences where existing CL methods tend to perform poorly. We then conduct separate evaluations of individual CL methods using CLDy B, discovering their respective strengths and weaknesses. [...] In this section, we first present the experimental setups, and then evaluate nine state-of-the-art CL methods both jointly and separately using CLDy B, followed by insightful analysis.
Researcher Affiliation Academia 1City University of Hong Kong 2Nanyang Technological University 3Australian National University 4Zhejiang University
Pseudocode Yes The pseudocode for CLDy B can be found in Algorithm 3 of the Appendix. [...] C PSEUDOCODE FOR CLDYB
Open Source Code Yes The source code and generated task sequences are publicly accessible at https://github.com/szc12153/CLDy B.
Open Datasets Yes To build the playground for CLDy B, we assemble a total of 26 datasets, consisting of 2, 505, 185 images across 2, 403 categories. These datasets are categorized into two main groups: photographic image datasets for the main experiments and AI-generated image datasets for additional analysis. The included classes cover a broad spectrum of domains, levels of granularity, cultural contexts, and time periods. Details are provided in Appendix A. [...] Photographic image data. This portion of the CLDy B data pool includes 2, 043 classes from 22 publicly available image recognition datasets, and can be categorized into two major domains: [...] Table 2: Statistics of the CLDy B data pool.
Dataset Splits No Each task Tt contains |Tt| (image, label) pairs {x(j) t , y(j) t }|Tt| j=1, divided into training, validation, and test subsets. [...] During training, hyper-parameters for each method are selected using the validation sets of the first three tasks, following Chaudhry et al. (2018). The paper mentions training, validation, and test subsets, but does not provide specific percentages or sample counts for these splits.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup Yes Each task in the CLDy B sequences is a 20-category classification problem (i.e., K = 20) with a sequence length of ten (i.e., N = 10). All CL methods employ Vi T-Base-Sup21K (Dosovitskiy et al., 2021) as the foundation model. All main experiments are repeated across three random runs, and mean results are reported. To ensure a fair comparison of task selection strategies, the first task is randomly chosen and fixed, maintaining consistent initial conditions. [...] During training, hyper-parameters for each method are selected using the validation sets of the first three tasks, following Chaudhry et al. (2018).