A Second-Order Perspective on Model Compositionality and Incremental Learning
Authors: Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega, Monica Millunzi, Simone Calderara, Rita Cucchiara
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin to achieve composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in certain tasks. Code available at https://github.com/aimagelab/mammoth. Section 5: EXPERIMENTS. As reported in Tab. 1, ITA and IEL outperform existing approaches on all datasets except MIT-67 and Crop Disease. All methods, including ours, utilize the same backbone a Vi TB/16 (Dosovitskiy et al., 2021) with supervised pre-training on Image Net21K (Ridnik et al., 2021) and the same batch size (128). We compute the accuracy on all classes at the end of the final task (Final Accuracy, FA). |
| Researcher Affiliation | Collaboration | 1University of Modena and Reggio Emilia, Italy 2Axyon AI, Italy 3IIT-CNR, Italy |
| Pseudocode | Yes | Algorithm 1 Incremental Task Arithmetic (ITA) vs. Incremental Ensemble Learning (IEL) |
| Open Source Code | Yes | Code available at https://github.com/aimagelab/mammoth. |
| Open Datasets | Yes | Datasets. Following affine works (Wang et al., 2022b; Bowman et al., 2023; Liu & Soatto, 2023), we evaluate on these class-incremental benchmarks: Split Image Net-R (Hendrycks et al., 2021) (10 tasks 20 classes each), Split CIFAR-100 (Krizhevsky et al., 2009) (10 tasks 10 classes), Split CUB-200 (Wah et al., 2011) (10 tasks 20 classes), Split Caltech-256 (Griffin et al., 2007) (10 tasks, as in (Liu & Soatto, 2023)), and Split MIT-67 (Quattoni & Torralba, 2009) (10 tasks, as in (Liu & Soatto, 2023)). We conduct further tests on the aerial and medical domains using Split RESISC45 (Cheng et al., 2017) (9 tasks 5 classes) and Split Crop Diseases (Hughes et al., 2015) (7 tasks 5 classes). |
| Dataset Splits | Yes | Following Buzzega et al. (2020b), the hyperparameters are chosen through a grid search on a validation set (i.e., 10% of the training set). Standard domains: Split CIFAR-100 (Krizhevsky et al., 2009) and Split Image Net R (Hendrycks et al., 2021), with respectively 100 and 200 classes split into 10 tasks. We train each task of Split Image Net-R for 30 epochs and each task of Split CIFAR-100 for 20 epochs. Following (Liu & Soatto, 2023), we also employ Split Caltech256 (Griffin et al., 2007) and Split MIT-67 (Quattoni & Torralba, 2009), dividing both into 10 tasks (5 epoch each). Specialized domain: We adopt Split CUB-200 (Wah et al., 2011)... The classes are split across 10 tasks, each lasting for 50 epochs. Aerial domain: we use Split RESISC45 (Cheng et al., 2017)... The dataset contains 45 classes (e.g., airport, cloud, island, and so on) divided into 9 tasks, with each task lasting 30 epochs. Medical domain: we finally explore the medical setting (i.e., plant diseases) and conduct experiments on Split Crop Diseases (Hughes et al., 2015). It regards infected leaves with 7 tasks of 5 classes each (5 epochs). |
| Hardware Specification | No | The paper mentions 'access GPU clusters' in the introduction and acknowledges 'the CINECA award under the ISCRA initiative, for the availability of highperformance computing resources and support.' However, it does not specify any particular GPU models (e.g., NVIDIA A100), CPU types, or detailed configurations of these computing resources. |
| Software Dependencies | No | The paper mentions 'Our code on Mammoth (Buzzega et al., 2020b;a)' and 'Adam W optimizer (Loshchilov & Hutter, 2019)'. However, it does not provide specific version numbers for Mammoth, Python, PyTorch, or the AdamW optimizer library. |
| Experiment Setup | Yes | All methods, including ours, utilize the same backbone a Vi TB/16 (Dosovitskiy et al., 2021) with supervised pre-training on Image Net21K (Ridnik et al., 2021) and the same batch size (128). Following Buzzega et al. (2020b), the hyperparameters are chosen through a grid search on a validation set. In each experiment, we use the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate of 3 10 4 for Lo RA and (IA)3 fine-tuning, and 1 10 4 for full fine-tuning. For both ITA and IEL: # epochspre-tuning = 3; lrpre-tuning = 1.0 10 2; (specific hyperparameters for each dataset are detailed in Appendix I). |