reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

Authors: Shengzhuang Chen, Yikai Liao, Xiaoxiao Sun, Kede Ma, Ying Wei

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Leveraging CLDy B, we first conduct a joint evaluation of multiple state-of-the-art CL methods, leading to a set of commonly challenging and generalizable task sequences where existing CL methods tend to perform poorly. We then conduct separate evaluations of individual CL methods using CLDy B, discovering their respective strengths and weaknesses. [...] In this section, we first present the experimental setups, and then evaluate nine state-of-the-art CL methods both jointly and separately using CLDy B, followed by insightful analysis.
Researcher Affiliation	Academia	1City University of Hong Kong 2Nanyang Technological University 3Australian National University 4Zhejiang University
Pseudocode	Yes	The pseudocode for CLDy B can be found in Algorithm 3 of the Appendix. [...] C PSEUDOCODE FOR CLDYB
Open Source Code	Yes	The source code and generated task sequences are publicly accessible at https://github.com/szc12153/CLDy B.
Open Datasets	Yes	To build the playground for CLDy B, we assemble a total of 26 datasets, consisting of 2, 505, 185 images across 2, 403 categories. These datasets are categorized into two main groups: photographic image datasets for the main experiments and AI-generated image datasets for additional analysis. The included classes cover a broad spectrum of domains, levels of granularity, cultural contexts, and time periods. Details are provided in Appendix A. [...] Photographic image data. This portion of the CLDy B data pool includes 2, 043 classes from 22 publicly available image recognition datasets, and can be categorized into two major domains: [...] Table 2: Statistics of the CLDy B data pool.
Dataset Splits	No	Each task Tt contains \|Tt\| (image, label) pairs {x(j) t , y(j) t }\|Tt\| j=1, divided into training, validation, and test subsets. [...] During training, hyper-parameters for each method are selected using the validation sets of the first three tasks, following Chaudhry et al. (2018). The paper mentions training, validation, and test subsets, but does not provide specific percentages or sample counts for these splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup	Yes	Each task in the CLDy B sequences is a 20-category classification problem (i.e., K = 20) with a sequence length of ten (i.e., N = 10). All CL methods employ Vi T-Base-Sup21K (Dosovitskiy et al., 2021) as the foundation model. All main experiments are repeated across three random runs, and mean results are reported. To ensure a fair comparison of task selection strategies, the first task is randomly chosen and fixed, maintaining consistent initial conditions. [...] During training, hyper-parameters for each method are selected using the validation sets of the first three tasks, following Chaudhry et al. (2018).