Towards Optimization-Friendly Binary Neural Network

Authors: Nianhui Guo, Joseph Bethge, Hong Guo, Christoph Meinel, Haojin Yang

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate the superiority of the method on several vision classification tasks CIFAR-10/100 & Image Net. For instance, the BNext family outperforms previous BNNs under different capacity levels and contributes the first binary neural network to reach the state-of-the-art 80.57% Top-1 accuracy on Image Net with 0.82 GOPS, which verifies the potential of BNNs and already contributes a strong baseline for future research on high-accuracy BNNs.
Researcher Affiliation Academia Nianhui Guo EMAIL Hasso Plattner Institut, University of Potsdam, Germany
Pseudocode Yes Algorithm 1 Visualization the loss landscape of BNNs Algorithm 2 Consecutive Knowledge Distillation
Open Source Code Yes The code is publicly available at https://github.com/hpi-xnor/BNext.
Open Datasets Yes In this section, we evaluate our model and methods on the ILSVRC12 Image Net (Deng et al., 2009) and the CIFAR.
Dataset Splits Yes Note that the Image Net dataset also needs to be downloaded and prepared manually in the usual manner (using a train and val folder for the respective split). The validation images need to be moved into labeled subfolders. CIFAR: We train the model with a batch size of 128 for 256 epochs, using standard data augmentations such as random crop, random horizontal flip, and normalization (Paszke et al., 2019).
Hardware Specification Yes The models are trained on 8 Nvidia DGX-A100 GPUs. Our study evaluates inference efficiency of binary neural networks (BNNs) on a Banana Pi M5, using the Larq Library.
Software Dependencies Yes We use a virtualized environment for Py Torch (Paszke et al., 2019) based on Ananconda for our code setup. The hosts system thus needs support for Python 3.9.13 and a recent NVIDIA CUDA driver (we tested driver version 470.82.01 with CUDA 11.4 before this submission) for training with GPU.
Experiment Setup Yes We employ the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 10 3 and weight decays of 10 3 and 10 8 for non-binary and binary parameters, respectively. We warm up the learning rate for 5 epochs and then reduce it using a cosine scheduler (Paszke et al., 2019). We set the backward gradient clipping range of the STE (Func.1) as [-1.5, 1.5]. Image Net: We train our model with an input resolution of 224x224 and a batch size of 512 for 512 epochs. To enhance data augmentation, we use Rand Augment (7, 0.5) (Cubuk et al., 2020). CIFAR: We train the model with a batch size of 128 for 256 epochs, using standard data augmentations such as random crop, random horizontal flip, and normalization (Paszke et al., 2019). We optimize the model using cross-entropy loss (Paszke et al., 2019).