reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Four Things Everyone Should Know to Improve Batch Normalization

Authors: Cecilia Summers, Michael J. Dinneen

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our results empirically on six datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers-102, CUB-2011, and Image Net.
Researcher Affiliation	Academia	Cecilia Summers Department of Computer Science University of Auckland EMAIL Michael J. Dinneen Department of Computer Science University of Auckland EMAIL
Pseudocode	No	The paper contains mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We have released code at https://github.com/ceciliaresearch/four_things_ batch_norm.
Open Datasets	Yes	We validate our results empirically on six datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers-102, CUB-2011, and Image Net. ... Image Net ILSVRC 2012 validation set (Russakovsky et al., 2015) ... CIFAR-100 (Krizhevsky & Hinton, 2009) ... SVHN (Netzer et al., 2011) ... Flowers-102 (Nilsback & Zisserman, 2008) ... CUB-2011 (Wah et al., 2011)
Dataset Splits	Yes	Of the six datasets we experiment with, only Image Net (Russakovsky et al., 2015) and Flowers-102 (Nilsback & Zisserman, 2008) have their own pre-deﬁned validation split, so we constructed validation splits for the other datasets as follows: for CIFAR-100 (Krizhevsky & Hinton, 2009), we randomly took 40,000 of the 50,000 training images for the training split, and the remaining 10,000 as a validation split. For SVHN (Netzer et al., 2011), we similarly split the 604,388 non-test images in a 80-20% split for training and validation. For Caltech-256, no canonical splits of any form are deﬁned, so we used 40 images of each of the 256 categories for training, 10 images for validation, and 30 for testing. For CUB-2011, we used 25% of the given training data as a validation set.
Hardware Specification	Yes	All experiments were done on two Nvidia Geforce GTX 1080 Ti GPUs.
Software Dependencies	No	The paper mentions the TensorFlow-slim image classiﬁcation model library but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The model used for CIFAR-100 and SVHN was Res Net-18 (He et al., 2016b;a) with 64, 128, 256, and 512 ﬁlters across blocks. For Caltech-256, a much larger Inception-v3 (Szegedy et al., 2016) model was used, and we additionally experiment with Res Net-152 (He et al., 2016b) on Flowers-102 and CUB-2011 in Sec. 4.3. All experiments were done on two Nvidia Geforce GTX 1080 Ti GPUs. ... with an overall batch size of B and a ghost batch size of B ... with a batch size B = 128.