SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Authors: Minjun Kim, Jongjin Kim, U Kang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that SYNQ provides the state-of-the-art accuracy, over existing ZSQ methods.
Researcher Affiliation Academia Minjun Kim, Jongjin Kim & U Kang Seoul National University, Seoul, South Korea EMAIL
Pseudocode Yes Algorithm 1 Quantization procedure of SYNQ
Open Source Code Yes Reproducibility. All of our implementation and datasets are available at https://github.com/snudm-starlab/Syn Q.
Open Datasets Yes We evaluate our method across three datasets by reporting the top-1 accuracy for the validation sets of CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (ILSVRC 2012) (Deng et al., 2009) datasets.
Dataset Splits Yes We evaluate our method across three datasets by reporting the top-1 accuracy for the validation sets of CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and Image Net (ILSVRC 2012) (Deng et al., 2009) datasets.
Hardware Specification Yes All of our experiments were done at a workstation with Intel Xeon Silver 4214 and RTX 3090.
Software Dependencies No We implement SYNQ with Py Torch and Torch Vision libraries in Python.
Experiment Setup Yes We generate 5,120 images with a batch size of 256. The batch size for fine-tuning is 256 for CIFAR-10/100 and 16 for Image Net with epochs uniformly set to 100. We search τ, D0, λCE, and λCAM within the ranges {0.5, 0.55, 0.6, 0.65, 0.7}, {20, 40, 60, 80, 100}, {0.005, 0.05, 0.5, 5}, and {20, 50, 100, 200, 300, 500, 2000}, respectively. All of our experiments were done at a workstation with Intel Xeon Silver 4214 and RTX 3090. ... For the fine-tuning of the quantized model, the procedure follows Equation 6, employing SGD with a momentum of 0.9 and a weight decay of 1e-4. The batch size is set to 256 for CIFAR-10/100 and 16 for Image Net. Initial learning rate is searched within the range of {1e-4, 1e-5, 1e-6} and is decayed by a factor of 0.1 over training epochs nep = 100.