Four Things Everyone Should Know to Improve Batch Normalization

Authors: Cecilia Summers, Michael J. Dinneen

ICLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our results empirically on six datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers-102, CUB-2011, and Image Net.
Researcher Affiliation Academia Cecilia Summers Department of Computer Science University of Auckland EMAIL Michael J. Dinneen Department of Computer Science University of Auckland EMAIL
Pseudocode No The paper contains mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes We have released code at https://github.com/ceciliaresearch/four_things_ batch_norm.
Open Datasets Yes We validate our results empirically on six datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers-102, CUB-2011, and Image Net. ... Image Net ILSVRC 2012 validation set (Russakovsky et al., 2015) ... CIFAR-100 (Krizhevsky & Hinton, 2009) ... SVHN (Netzer et al., 2011) ... Flowers-102 (Nilsback & Zisserman, 2008) ... CUB-2011 (Wah et al., 2011)
Dataset Splits Yes Of the six datasets we experiment with, only Image Net (Russakovsky et al., 2015) and Flowers-102 (Nilsback & Zisserman, 2008) have their own pre-defined validation split, so we constructed validation splits for the other datasets as follows: for CIFAR-100 (Krizhevsky & Hinton, 2009), we randomly took 40,000 of the 50,000 training images for the training split, and the remaining 10,000 as a validation split. For SVHN (Netzer et al., 2011), we similarly split the 604,388 non-test images in a 80-20% split for training and validation. For Caltech-256, no canonical splits of any form are defined, so we used 40 images of each of the 256 categories for training, 10 images for validation, and 30 for testing. For CUB-2011, we used 25% of the given training data as a validation set.
Hardware Specification Yes All experiments were done on two Nvidia Geforce GTX 1080 Ti GPUs.
Software Dependencies No The paper mentions the TensorFlow-slim image classification model library but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes The model used for CIFAR-100 and SVHN was Res Net-18 (He et al., 2016b;a) with 64, 128, 256, and 512 filters across blocks. For Caltech-256, a much larger Inception-v3 (Szegedy et al., 2016) model was used, and we additionally experiment with Res Net-152 (He et al., 2016b) on Flowers-102 and CUB-2011 in Sec. 4.3. All experiments were done on two Nvidia Geforce GTX 1080 Ti GPUs. ... with an overall batch size of B and a ghost batch size of B ... with a batch size B = 128.