reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks

Authors: Nazia Tasnim, Bryan Plummer

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across six diverse datasets demonstrate RECAST outperforms the state-of-the-art by up to 1.5% and improves baselines > 3% across various scales, architectures, and parameter spaces. We evaluate RECAST s effectiveness on diverse datasets in a Task-incremental IL setting. We compare against 16 baselines comprising state-of-the-art IL methods on both CNN and Transformer architectures, where RECAST reports > 3% gain over prior work.
Researcher Affiliation	Academia	Nazia Tasnim Boston University EMAIL Bryan A. Plummer Boston University EMAIL
Pseudocode	Yes	Algorithm 1 RECAST Framework; Algorithm 2 Neural Mimicry
Open Source Code	Yes	1Code: Repository; The implementation details of our custom Res Net and Vision Transformer architectures are fully described in the main text, with complete code provided in the supplementary materials.
Open Datasets	Yes	Following standard experiment setups in similar works by Aljundi et al. (2019); Ge et al. (2023) we employ six diverse benchmarking datasets covering fine-grained to coarse image classification tasks across various domain including flowers (Nilsback & Zisserman, 2008), scenes (Quattoni & Torralba, 2009), birds (Wah et al., 2011), animals (Krizhevsky & Hinton, 2009), vehicles (Maji et al., 2013), and other man-made objects (Krizhevsky & Hinton, 2009). In Table 3 we have summarized the class variations and number of samples for the six datasets we have used in our TIL experiments.
Dataset Splits	Yes	All the datasets were split with 75% 15% 15% train-validation-test split.
Hardware Specification	Yes	All experiments were run on a single RTX8000 GPU.
Software Dependencies	No	We trained GDUMB (Prabhu et al., 2020), EWC (Lee et al., 2019), LWF (Li & Hoiem, 2016), and L2P (Wang et al., 2022b) using the avalanche library (Carta et al., 2023). Official Py Torch implementations of other methods were modified for TIL settings. All experiments were conducted using Py Torch, with specific versions and dependencies listed in the supplementary materials. The main text does not provide specific version numbers for PyTorch or avalanche.
Experiment Setup	Yes	RECAST, Me Lo (Zhu et al., 2024), and Adapt Former (Chen et al., 2022), Ro SA (Nikdan et al., 2024), Do RA (Liu et al., 2024b) used Adam W (Loshchilov & Hutter, 2017) with 2e 3 5e 3 learning rates, 1e 6 weight decay, and stepwise LR scheduling (decay by 0.1 every 33 epochs) for 100 epochs. Default hyperparameters were used for avalanche models and methods like HAT (Serra et al., 2018), Piggyback (Mallya et al., 2018), and CLR (Ge et al., 2023), trained for 100 epochs. For RECAST-Vi T, we used a group size of 6, 2 templates per bank, and 2 coefficient sets.