Training DNNs Resilient to Adversarial and Random Bit-Flips by Learning Quantization Ranges

Authors: Kamran Chitsaz, Goncalo Mordido, Jean-Pierre David, François Leduc-Primeau

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on different models and datasets showcase that DNNs trained with WCAT can tolerate a high amount of noise while keeping the accuracy close to the baseline model.
Researcher Affiliation Academia Kamran Chitsaz EMAIL Polytechnique Montreal Gonçalo Mordido EMAIL Mila, Polytechnique Montreal Jean-Pierre David EMAIL Polytechnique Montreal François Leduc-Primeau EMAIL Polytechnique Montreal
Pseudocode Yes Algorithm 1 Weight clipping-aware training (WCAT)
Open Source Code Yes 1Code is available at https://github.com/kmchiti/WCAT
Open Datasets Yes We conducted our experiments using three popular vision datasets: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Image Net (Krizhevsky et al., 2017).
Dataset Splits Yes Both CIFAR10 and CIFAR100 comprise 60,000 RGB images with dimensions of 32 32 3. As per the standard approach, 50,000 examples were utilized for training and the remaining 10,000 for testing, with images evenly drawn from 10 and 100 classes, respectively. On the other hand, Image Net is a more extensive dataset, consisting of 1.2 million training images across 1,000 classes, with input dimensions of 224 224 3.
Hardware Specification No The paper does not explicitly state the specific hardware (e.g., GPU models, CPU models, memory specifications) used for training or inference of the DNN models.
Software Dependencies No The paper mentions using stochastic gradient descent (SGD) but does not provide specific software library versions (e.g., PyTorch, TensorFlow) or their version numbers. It mentions Accelergy and CACTI in the context of energy modeling, not as core software dependencies for the main experiments with version numbers.
Experiment Setup Yes Each model on CIFAR10 and CIFAR100 was trained using stochastic gradient descent (SGD) Robbins & Monro (1951) with a Nesterov momentum of 0.9, a weight decay of 5 10 4, and a batch size of 128 for 300 epochs for 4 and 8 bit weight quantization. We used a cosine annealing learning rate scheduler, with an initial rate of 0.1. To fine-tune our 8-bit network on Image Net, we began with pre-trained weights and conducted a single epoch of additional training, using SGD with momentum and a learning rate of 10 4. As for our 4-bit networks, we initialized the pre-trained weights and fine-tuned them for 110 epochs using SGD with momentum, with a batch size of 256. We implemented exponential learning rate decay, starting from an initial rate of 0.0015 and decaying it to a final value of 10 6.