Image retrieval outperforms diffusion models on data augmentation

Authors: Max F Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. ... We benchmark a variety of existing augmentation techniques based on stable diffusion models ... We obtained all classification performance numbers by training a Res Net-50 (He et al., 2016).
Researcher Affiliation Collaboration 1International Max Planck Research School for Intelligent Systems, Tübingen, Germany 2Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany 3Tübingen AI Center, University of Tübingen, Germany 4Mirelo AI 5ELLIS Institute Tübingen, Germany 6Glaxo Smith Kline, AI&ML, Zug, Switzerland 7Amazon, Tübingen, Germany 8Institute of Science and Technology Austria 9Oxford Internet Institute, University of Oxford, United Kingdom
Pseudocode No The paper describes methods and protocols in detailed prose and refers to figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using a 'clip-retrieval package (Beaumont, 2022)' which implies an external tool, but there is no explicit statement from the authors of this paper releasing their own implementation code for the methodologies described.
Open Datasets Yes Image Net Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2015)... Laion 5b (Schuhmann et al., 2022)... Caltech256 (Griffin et al., 2007).
Dataset Splits Yes Hence, unless explicitly stated, we use a 10% sample (126,861 training images) of the Image Net Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2015) training split... We additionally sample a disjoint set of images of the same size from the training split for hyperparameter optimization and model selection, which we refer to as the validation split. ... On Caltech256 we used a batch size of 1024 and after an initial learning rate optimization sweep we set the learning rate to 0.003.
Hardware Specification Yes retrieval was the fastest method, only requiring a CPU machine with 4 GB of CPU RAM and a 1.6 TB SSD disk... Time-requirements for DM fine-tuning depended on the specific method: ... trained on 8 Nvidia V100 GPUs using distributed data parallel training... Generating 390 images for each Image Net class required 867 GPU h (Nvidia T4 GPU)... Res Net-50 classifier training ... trained on 8 Nvidia T4 GPUs using distributed data parallel training).
Software Dependencies Yes To generate images, we used the pretrained Stable Diffusion v1.4 model... The dataset provides a CLIP embedding nearest neighbors search index for each instance, and an official package (Beaumont, 2022) allows for fast search and retrieval. We used this to retrieve... We used this to retrieve... We applied the duplicate detector of the clip-retrieval package (Beaumont, 2022).
Experiment Setup Yes On Image Net we trained the classifier with a batch size of 256 and a learning rate of 0.1. On Caltech256 we used a batch size of 1024 and after an initial learning rate optimization sweep we set the learning rate to 0.003. We divided the learning rate by 10 when validation accuracy failed to improve for 10 epochs. We stopped training when the validation accuracy did not improve for 20 epochs or after at most 270 epochs... We trained all models with the default Stable Diffusion optimization objective (Rombach et al., 2022) until the validation loss stagnated or increased no model was trained for more than 40 epochs.