reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PopulAtion Parameter Averaging (PAPA)

Authors: Alexia Jolicoeur-Martineau, Emy Gervais, Kilian FATRAS, Yan Zhang, Simon Lacoste-Julien

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate in Section 4 that PAPA and its variants lead to substantial performance gains when training small network populations (2-10 networks) from scratch with low compute (1 GPU). Our method increases the average accuracy of the population by up to 0.8% on CIFAR-10 (5-10 networks), 1.9% on CIFAR-100 (5-10 networks), and 1.6% on Image Net (2-3 networks).
Researcher Affiliation	Collaboration	Alexia Jolicoeur-Martineau EMAIL Samsung SAIT AI Lab, Montreal Emy Gervais EMAIL Independent Kilian Fatras EMAIL Mila, Mc Gill University Yan Zhang EMAIL Samsung SAIT AI Lab, Montreal Simon Lacoste-Julien EMAIL Mila, University of Montreal Samsung SAIT AI Lab, Montreal Canada CIFAR AI Chair
Pseudocode	Yes	Figure 1 shows an illustration of PAPA and Algorithm 1 provides the full description of PAPA and its variants (PAPA-all and PAPA-2).
Open Source Code	No	The paper does not explicitly provide a link to source code or an affirmative statement of code release. It states, "Our code will be released upon publication" in the abstract, which indicates future availability, not current access.
Open Datasets	Yes	For image classification, we train models from scratch on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009); we also fine-tune pre-trained models on CIFAR-100. For image segmentation, we train models from scratch on ISPRS Vaihingen (Rottensteiner et al., 2012).
Dataset Splits	Yes	For image classification, we only have access to train and test data; thereby, we remove 2% of the training data to use as evaluation data for the greedy soups. For the Vaihingen dataset (Rottensteiner et al., 2012), we follow the training procedure and Py Torch implementation from (Audebert et al., 2017). We use a UNet (Ronneberger et al., 2015) and the train, validation, and test splits from (Fatras et al., 2021). We use 11 tiles for training, 5 tiles for validation, and the remaining 17 tiles for testing our model.
Hardware Specification	Yes	For all experiments, we use a single GPU: A-100 40Gb (for Imagenet) or V-100 16Gb (for all other experiments).
Software Dependencies	No	The paper mentions software like PyTorch and optimizers like SGD, Adam, AdamW, but does not provide specific version numbers for these software dependencies, which are required for a reproducible description.
Experiment Setup	Yes	For training-from-scratch on CIFAR-10 and CIFAR-100, training is done over 300 epochs with a cosine learning rate (1e-1 to 1e-4) (Loshchilov and Hutter, 2016) using SGD with a weight decay of 1e-4. Batch size is 64 and REPAIR uses 5 forward-passes. For training-from-scratch on Image Net, training is done over 90 epochs with a cosine learning rate (1e-1 to 1e-4) (Loshchilov and Hutter, 2016) using SGD with a weight decay of 1e-4. Batch size is 256 and REPAIR is not used.