reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Batch Normalization Preconditioning for Neural Network Training

Authors: Susanna Lange, Kyle Helfrich, Qiang Ye

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide experimental results in Section 5. ... In this section, we compare BNP with several baseline methods on several architectures for image classiﬁcation tasks. We also present some exploratory experiments to study computational timing comparison as well as improved condition numbers.
Researcher Affiliation	Academia	Susanna Lange EMAIL Department of Mathematics, University of Kentucky Lexington, KY 40506 Kyle Helfrich EMAIL Department of Mathematics, University of Dayton Dayton, OH 45469 Qiang Ye EMAIL Department of Mathematics, University of Kentucky Lexington, KY 40506
Pseudocode	Yes	Algorithm 1 Batch Normalization Bβ,γ(h) ... Algorithm 2 One Step of BNP Training on W (ℓ), b(ℓ) of the ℓth Dense Layer ... Algorithm 3 One Step of BNP Training of a Convolution Layer with weight w and bias b
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	Data sets: We use MNIST, CIFAR10, CIFAR100, and Image Net data sets. The MNIST data set (Le Cun et al., 2013) ... The CIFAR10 and CIFAR100 data sets (Krizhevsky et al., 2009) ... The Image Net datatset (Russakovsky et al., 2015)
Dataset Splits	Yes	The MNIST data set (Le Cun et al., 2013) consists of 70,000 black and white images of handwritten digits ranging from 0 to 9. Each image is 28 by 28 pixels. There are 60,000 training images and 10,000 testing images. The CIFAR10 and CIFAR100 data sets (Krizhevsky et al., 2009) consist of 60,000 color images of 32 by 32 pixels with 50,000 training images and 10,000 testing images. ... The Image Net datatset (Russakovsky et al., 2015) consists of 1,431,167 color images with 1,281,167 training images, 50,000 validation images, and 100,000 testing images.
Hardware Specification	Yes	These performance time experiments are computed on NVIDIA Tesla V100-SXM2-32GB.
Software Dependencies	Yes	Experiments were run using Py Torch 3 and Tensorﬂow versions 1.13.1 and 2.4.1.
Experiment Setup	Yes	Default hyperparameters for BN and optimizers as implemented in Tensorﬂow or Py Torch are used, as appropriate. For BNP, the default values ϵ1 = 10 2, ϵ2 = 10 4, and ρ = 0.99 are also used. ... Each model is trained using SGD. ... We implement all parameter settings suggested in He et al. (2016a) for BN, with the exception that Preactivation Res Net-110 for CIFAR-100 follows the learning rate decay suggested in Han et al. (2016). These include weight regularization of 1E 4 and a learning rate warmup with initial learning rate 0.01 increasing to 0.1 after 400 iterations. For networks with GN, we follow Wu and He (2018) and replace all BN layers with GN. We use group size 4. For BNP+GN with CIFAR10, we use weight regularization of 1.5E 4, the He-Normal weight initialization scaled by 0.1, and group size 4 in GN. For BNP+GN with CIFAR100, we use weight regularization of 2E 4, the He-Normal weight initialization scaled by 0.4, and group size 4. GN and BNP+GN use a linear warmup schedule, with initial learning rate 0.01 increasing to 0.1 over 1 or 2 epochs, tuned for each network. For the Res Net-18 experiment with Image Net, we follow the settings of Krizhevsky et al. (2012). All images are cropped to 224 224 pixel size from each image or its horizontal ﬂip Krizhevsky et al. (2012). All models use momentum optimizer with 0.9, weight regularization 1E 4, except BNP+GN uses 8.5E 4, a mini-batch size of 256 and train on 1 GPU. All models use an initial learning rate of 0.1 which is divided by 10 at 30, 60, and 90 epochs. Both GN and BNP+GN use groupsize 32. The best learning rates for all models are listed in Table 3.