Towards Optimization-Friendly Binary Neural Network
Authors: Nianhui Guo, Joseph Bethge, Hong Guo, Christoph Meinel, Haojin Yang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate the superiority of the method on several vision classification tasks CIFAR-10/100 & Image Net. For instance, the BNext family outperforms previous BNNs under different capacity levels and contributes the first binary neural network to reach the state-of-the-art 80.57% Top-1 accuracy on Image Net with 0.82 GOPS, which verifies the potential of BNNs and already contributes a strong baseline for future research on high-accuracy BNNs. |
| Researcher Affiliation | Academia | Nianhui Guo EMAIL Hasso Plattner Institut, University of Potsdam, Germany |
| Pseudocode | Yes | Algorithm 1 Visualization the loss landscape of BNNs Algorithm 2 Consecutive Knowledge Distillation |
| Open Source Code | Yes | The code is publicly available at https://github.com/hpi-xnor/BNext. |
| Open Datasets | Yes | In this section, we evaluate our model and methods on the ILSVRC12 Image Net (Deng et al., 2009) and the CIFAR. |
| Dataset Splits | Yes | Note that the Image Net dataset also needs to be downloaded and prepared manually in the usual manner (using a train and val folder for the respective split). The validation images need to be moved into labeled subfolders. CIFAR: We train the model with a batch size of 128 for 256 epochs, using standard data augmentations such as random crop, random horizontal flip, and normalization (Paszke et al., 2019). |
| Hardware Specification | Yes | The models are trained on 8 Nvidia DGX-A100 GPUs. Our study evaluates inference efficiency of binary neural networks (BNNs) on a Banana Pi M5, using the Larq Library. |
| Software Dependencies | Yes | We use a virtualized environment for Py Torch (Paszke et al., 2019) based on Ananconda for our code setup. The hosts system thus needs support for Python 3.9.13 and a recent NVIDIA CUDA driver (we tested driver version 470.82.01 with CUDA 11.4 before this submission) for training with GPU. |
| Experiment Setup | Yes | We employ the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 10 3 and weight decays of 10 3 and 10 8 for non-binary and binary parameters, respectively. We warm up the learning rate for 5 epochs and then reduce it using a cosine scheduler (Paszke et al., 2019). We set the backward gradient clipping range of the STE (Func.1) as [-1.5, 1.5]. Image Net: We train our model with an input resolution of 224x224 and a batch size of 512 for 512 epochs. To enhance data augmentation, we use Rand Augment (7, 0.5) (Cubuk et al., 2020). CIFAR: We train the model with a batch size of 128 for 256 epochs, using standard data augmentations such as random crop, random horizontal flip, and normalization (Paszke et al., 2019). We optimize the model using cross-entropy loss (Paszke et al., 2019). |