$C^2M^3$: Cycle-Consistent Multi-Model Merging
Authors: Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele RodolĂ
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. |
| Researcher Affiliation | Academia | Donato Crisostomi Sapienza University of Rome EMAIL Marco Fumero Institute of Science and Technology Austria EMAIL Daniele Baieri Sapienza University of Rome EMAIL Florian Bernard University of Bonn EMAIL Emanuele Rodol a Sapienza University of Rome EMAIL |
| Pseudocode | Yes | Algorithm 1 Frank-Wolfe for n-Model Matching |
| Open Source Code | Yes | Finally, to foster reproducible research in the field, we release a modular and reusable codebase containing implementations of our approach and the considered baselines.1 1https://github.com/crisostomi/cycle-consistent-model-merging |
| Open Datasets | Yes | We employ the most common datasets for image classification tasks: MNIST [9], CIFAR-10 [23], EMNIST [7] and CIFAR-100 [23], having 10, 10, 26 and 100 classes respectively. We use the standard train-test splits provided by torchvision for all datasets. |
| Dataset Splits | Yes | We use the standard train-test splits provided by torchvision for all datasets. |
| Hardware Specification | Yes | All of the experiments were carried out using consumer hardware, in particular mostly on a 32Gi B RAM machine with a 12th Gen Intel(R) Core(TM) i7-12700F processor and an Nvidia RTX 3090 GPU, except for some of the experiments that were carried on a 2080. |
| Software Dependencies | No | The paper mentions software like "Py Torch", "Py Torch Lightning", and "NN-Template" but does not specify their version numbers, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | In particular, we train most of our models with a batch size of 100 for 250 epochs, using SGD with momentum 0.9, a learning rate of 0.1, and a weight decay of 10 4. We use a cosine annealing learning rate scheduler with a warm restart period of 10 epochs and a minimum learning rate of 0. |