reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training DNNs Resilient to Adversarial and Random Bit-Flips by Learning Quantization Ranges

Authors: Kamran Chitsaz, Goncalo Mordido, Jean-Pierre David, François Leduc-Primeau

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on different models and datasets showcase that DNNs trained with WCAT can tolerate a high amount of noise while keeping the accuracy close to the baseline model.
Researcher Affiliation	Academia	Kamran Chitsaz EMAIL Polytechnique Montreal Gonçalo Mordido EMAIL Mila, Polytechnique Montreal Jean-Pierre David EMAIL Polytechnique Montreal François Leduc-Primeau EMAIL Polytechnique Montreal
Pseudocode	Yes	Algorithm 1 Weight clipping-aware training (WCAT)
Open Source Code	Yes	1Code is available at https://github.com/kmchiti/WCAT
Open Datasets	Yes	We conducted our experiments using three popular vision datasets: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Image Net (Krizhevsky et al., 2017).
Dataset Splits	Yes	Both CIFAR10 and CIFAR100 comprise 60,000 RGB images with dimensions of 32 32 3. As per the standard approach, 50,000 examples were utilized for training and the remaining 10,000 for testing, with images evenly drawn from 10 and 100 classes, respectively. On the other hand, Image Net is a more extensive dataset, consisting of 1.2 million training images across 1,000 classes, with input dimensions of 224 224 3.
Hardware Specification	No	The paper does not explicitly state the specific hardware (e.g., GPU models, CPU models, memory specifications) used for training or inference of the DNN models.
Software Dependencies	No	The paper mentions using stochastic gradient descent (SGD) but does not provide specific software library versions (e.g., PyTorch, TensorFlow) or their version numbers. It mentions Accelergy and CACTI in the context of energy modeling, not as core software dependencies for the main experiments with version numbers.
Experiment Setup	Yes	Each model on CIFAR10 and CIFAR100 was trained using stochastic gradient descent (SGD) Robbins & Monro (1951) with a Nesterov momentum of 0.9, a weight decay of 5 10 4, and a batch size of 128 for 300 epochs for 4 and 8 bit weight quantization. We used a cosine annealing learning rate scheduler, with an initial rate of 0.1. To fine-tune our 8-bit network on Image Net, we began with pre-trained weights and conducted a single epoch of additional training, using SGD with momentum and a learning rate of 10 4. As for our 4-bit networks, we initialized the pre-trained weights and fine-tuned them for 110 epochs using SGD with momentum, with a batch size of 256. We implemented exponential learning rate decay, starting from an initial rate of 0.0015 and decaying it to a final value of 10 6.