reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Image retrieval outperforms diffusion models on data augmentation

Authors: Max F Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically evaluate a range of existing methods to generate images from diﬀusion models and study new extensions to assess their beneﬁt for data augmentation. ... We benchmark a variety of existing augmentation techniques based on stable diﬀusion models ... We obtained all classiﬁcation performance numbers by training a Res Net-50 (He et al., 2016).
Researcher Affiliation	Collaboration	1International Max Planck Research School for Intelligent Systems, Tübingen, Germany 2Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany 3Tübingen AI Center, University of Tübingen, Germany 4Mirelo AI 5ELLIS Institute Tübingen, Germany 6Glaxo Smith Kline, AI&ML, Zug, Switzerland 7Amazon, Tübingen, Germany 8Institute of Science and Technology Austria 9Oxford Internet Institute, University of Oxford, United Kingdom
Pseudocode	No	The paper describes methods and protocols in detailed prose and refers to figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using a 'clip-retrieval package (Beaumont, 2022)' which implies an external tool, but there is no explicit statement from the authors of this paper releasing their own implementation code for the methodologies described.
Open Datasets	Yes	Image Net Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2015)... Laion 5b (Schuhmann et al., 2022)... Caltech256 (Griﬃn et al., 2007).
Dataset Splits	Yes	Hence, unless explicitly stated, we use a 10% sample (126,861 training images) of the Image Net Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2015) training split... We additionally sample a disjoint set of images of the same size from the training split for hyperparameter optimization and model selection, which we refer to as the validation split. ... On Caltech256 we used a batch size of 1024 and after an initial learning rate optimization sweep we set the learning rate to 0.003.
Hardware Specification	Yes	retrieval was the fastest method, only requiring a CPU machine with 4 GB of CPU RAM and a 1.6 TB SSD disk... Time-requirements for DM ﬁne-tuning depended on the speciﬁc method: ... trained on 8 Nvidia V100 GPUs using distributed data parallel training... Generating 390 images for each Image Net class required 867 GPU h (Nvidia T4 GPU)... Res Net-50 classiﬁer training ... trained on 8 Nvidia T4 GPUs using distributed data parallel training).
Software Dependencies	Yes	To generate images, we used the pretrained Stable Diﬀusion v1.4 model... The dataset provides a CLIP embedding nearest neighbors search index for each instance, and an oﬃcial package (Beaumont, 2022) allows for fast search and retrieval. We used this to retrieve... We used this to retrieve... We applied the duplicate detector of the clip-retrieval package (Beaumont, 2022).
Experiment Setup	Yes	On Image Net we trained the classiﬁer with a batch size of 256 and a learning rate of 0.1. On Caltech256 we used a batch size of 1024 and after an initial learning rate optimization sweep we set the learning rate to 0.003. We divided the learning rate by 10 when validation accuracy failed to improve for 10 epochs. We stopped training when the validation accuracy did not improve for 20 epochs or after at most 270 epochs... We trained all models with the default Stable Diﬀusion optimization objective (Rombach et al., 2022) until the validation loss stagnated or increased no model was trained for more than 40 epochs.