Image retrieval outperforms diffusion models on data augmentation
Authors: Max F Burg, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco Locatello, Chris Russell
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. ... We benchmark a variety of existing augmentation techniques based on stable diffusion models ... We obtained all classification performance numbers by training a Res Net-50 (He et al., 2016). |
| Researcher Affiliation | Collaboration | 1International Max Planck Research School for Intelligent Systems, Tübingen, Germany 2Institute of Computer Science and Campus Institute Data Science, University of Göttingen, Germany 3Tübingen AI Center, University of Tübingen, Germany 4Mirelo AI 5ELLIS Institute Tübingen, Germany 6Glaxo Smith Kline, AI&ML, Zug, Switzerland 7Amazon, Tübingen, Germany 8Institute of Science and Technology Austria 9Oxford Internet Institute, University of Oxford, United Kingdom |
| Pseudocode | No | The paper describes methods and protocols in detailed prose and refers to figures, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using a 'clip-retrieval package (Beaumont, 2022)' which implies an external tool, but there is no explicit statement from the authors of this paper releasing their own implementation code for the methodologies described. |
| Open Datasets | Yes | Image Net Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2015)... Laion 5b (Schuhmann et al., 2022)... Caltech256 (Griffin et al., 2007). |
| Dataset Splits | Yes | Hence, unless explicitly stated, we use a 10% sample (126,861 training images) of the Image Net Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) (Russakovsky et al., 2015) training split... We additionally sample a disjoint set of images of the same size from the training split for hyperparameter optimization and model selection, which we refer to as the validation split. ... On Caltech256 we used a batch size of 1024 and after an initial learning rate optimization sweep we set the learning rate to 0.003. |
| Hardware Specification | Yes | retrieval was the fastest method, only requiring a CPU machine with 4 GB of CPU RAM and a 1.6 TB SSD disk... Time-requirements for DM fine-tuning depended on the specific method: ... trained on 8 Nvidia V100 GPUs using distributed data parallel training... Generating 390 images for each Image Net class required 867 GPU h (Nvidia T4 GPU)... Res Net-50 classifier training ... trained on 8 Nvidia T4 GPUs using distributed data parallel training). |
| Software Dependencies | Yes | To generate images, we used the pretrained Stable Diffusion v1.4 model... The dataset provides a CLIP embedding nearest neighbors search index for each instance, and an official package (Beaumont, 2022) allows for fast search and retrieval. We used this to retrieve... We used this to retrieve... We applied the duplicate detector of the clip-retrieval package (Beaumont, 2022). |
| Experiment Setup | Yes | On Image Net we trained the classifier with a batch size of 256 and a learning rate of 0.1. On Caltech256 we used a batch size of 1024 and after an initial learning rate optimization sweep we set the learning rate to 0.003. We divided the learning rate by 10 when validation accuracy failed to improve for 10 epochs. We stopped training when the validation accuracy did not improve for 20 epochs or after at most 270 epochs... We trained all models with the default Stable Diffusion optimization objective (Rombach et al., 2022) until the validation loss stagnated or increased no model was trained for more than 40 epochs. |