$C^2M^3$: Cycle-Consistent Multi-Model Merging

Authors: Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele RodolĂ 

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets.
Researcher Affiliation Academia Donato Crisostomi Sapienza University of Rome EMAIL Marco Fumero Institute of Science and Technology Austria EMAIL Daniele Baieri Sapienza University of Rome EMAIL Florian Bernard University of Bonn EMAIL Emanuele Rodol a Sapienza University of Rome EMAIL
Pseudocode Yes Algorithm 1 Frank-Wolfe for n-Model Matching
Open Source Code Yes Finally, to foster reproducible research in the field, we release a modular and reusable codebase containing implementations of our approach and the considered baselines.1 1https://github.com/crisostomi/cycle-consistent-model-merging
Open Datasets Yes We employ the most common datasets for image classification tasks: MNIST [9], CIFAR-10 [23], EMNIST [7] and CIFAR-100 [23], having 10, 10, 26 and 100 classes respectively. We use the standard train-test splits provided by torchvision for all datasets.
Dataset Splits Yes We use the standard train-test splits provided by torchvision for all datasets.
Hardware Specification Yes All of the experiments were carried out using consumer hardware, in particular mostly on a 32Gi B RAM machine with a 12th Gen Intel(R) Core(TM) i7-12700F processor and an Nvidia RTX 3090 GPU, except for some of the experiments that were carried on a 2080.
Software Dependencies No The paper mentions software like "Py Torch", "Py Torch Lightning", and "NN-Template" but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup Yes In particular, we train most of our models with a batch size of 100 for 250 epochs, using SGD with momentum 0.9, a learning rate of 0.1, and a weight decay of 10 4. We use a cosine annealing learning rate scheduler with a warm restart period of 10 epochs and a minimum learning rate of 0.