reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

$C^2M^3$: Cycle-Consistent Multi-Model Merging

Authors: Donato Crisostomi, Marco Fumero, Daniele Baieri, Florian Bernard, Emanuele Rodolà

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets.
Researcher Affiliation	Academia	Donato Crisostomi Sapienza University of Rome EMAIL Marco Fumero Institute of Science and Technology Austria EMAIL Daniele Baieri Sapienza University of Rome EMAIL Florian Bernard University of Bonn EMAIL Emanuele Rodol a Sapienza University of Rome EMAIL
Pseudocode	Yes	Algorithm 1 Frank-Wolfe for n-Model Matching
Open Source Code	Yes	Finally, to foster reproducible research in the field, we release a modular and reusable codebase containing implementations of our approach and the considered baselines.1 1https://github.com/crisostomi/cycle-consistent-model-merging
Open Datasets	Yes	We employ the most common datasets for image classification tasks: MNIST [9], CIFAR-10 [23], EMNIST [7] and CIFAR-100 [23], having 10, 10, 26 and 100 classes respectively. We use the standard train-test splits provided by torchvision for all datasets.
Dataset Splits	Yes	We use the standard train-test splits provided by torchvision for all datasets.
Hardware Specification	Yes	All of the experiments were carried out using consumer hardware, in particular mostly on a 32Gi B RAM machine with a 12th Gen Intel(R) Core(TM) i7-12700F processor and an Nvidia RTX 3090 GPU, except for some of the experiments that were carried on a 2080.
Software Dependencies	No	The paper mentions software like "Py Torch", "Py Torch Lightning", and "NN-Template" but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup	Yes	In particular, we train most of our models with a batch size of 100 for 250 epochs, using SGD with momentum 0.9, a learning rate of 0.1, and a weight decay of 10 4. We use a cosine annealing learning rate scheduler with a warm restart period of 10 epochs and a minimum learning rate of 0.