GradQ-ViT: Robust and Efficient Gradient Quantization for Vision Transformers

Authors: Dahun Choi, Hyun Kim

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When quantizing weights, activations, and gradients to INT8, our method improves performance by 0.52% and 0.21% over Dei T-Base and Swin-Base, respectively, and achieves near parity with Mobile Vi T-S with only a 0.09% accuracy drop. Furthermore, a 2.06 speedup was observed when applying our framework to Mobile Vi T in a CUDA 11.8 environment. (...) Experimental Results
Researcher Affiliation Academia Dahun Choi, Hyun Kim * Department of Electrical and Information Engineering, The Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, Seoul 01811, Korea EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Overall gradient quantization process
Open Source Code No The paper describes implementing a 'custom kernel code' for convolution and Matmul operations in a CUDA 11.8 environment but does not explicitly state that this code is open-source, publicly available, or provide a link to a repository.
Open Datasets Yes We evaluate the performance of the proposed algorithm on the image classification task using the Image Net dataset (Russakovsky et al. 2015) with the Py Torch framework in a GPU (RTX-3090) environment.
Dataset Splits No The paper mentions using the ImageNet dataset and various models (Dei T, Swin, Mobile Vi T) which commonly use standard splits. However, it does not explicitly provide the specific percentages or sample counts for training, validation, or test splits within the text.
Hardware Specification Yes We evaluate the performance of the proposed algorithm on the image classification task using the Image Net dataset (Russakovsky et al. 2015) with the Py Torch framework in a GPU (RTX-3090) environment. (...) Table 4: Running time of Mobile Vi T on the Ge Force RTX 3090 system
Software Dependencies Yes We implemented a custom kernel code for convolution and Matmul operations in a CUDA 11.8 environment, making full use of the parallel processing power of the GPU.
Experiment Setup No The paper mentions using the AdamW optimizer and cosine scheduler for Mobile Vi T but does not provide specific hyperparameter values like learning rate, batch size, or number of epochs in the main text. It refers to 'official code' for Dei T and Swin, implying settings might be derived from there, but doesn't list them.