Oscillations Make Neural Networks Robust to Quantization

Authors: Jonathan Wenshøj, Bob Pepin, Raghavendra Selvan

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on Res Net-18 and Tiny Vision Transformer, evaluated on CIFAR-10 and Tiny Image Net datasets, demonstrate across a range of quantization levels that training with oscillations followed by post-training quantization (PTQ) is sufficient to recover the performance of QAT in most cases.
Researcher Affiliation Academia Jonathan Wenshøj EMAIL Bob Pepin EMAIL Raghavendra Selvan EMAIL Department of Computer Science, University of Copenhagen, Denmark
Pseudocode No The paper provides mathematical derivations and describes mechanisms, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Source code is available at https://github.com/saintslab/osc_reg.
Open Datasets Yes We conducted our experiments using the CIFAR-10 (Krizhevsky & Hinton, 2009) and Tiny-Image Net (Le & Yang, 2015) datasets.
Dataset Splits Yes Training proceeded for a maximum of 100 epochs with early stopping triggered after 10 epochs without improvement in validation performance. For quantized models, we monitored the quantized validation accuracy at the target bit precision, while for the baseline, we tracked floating-point accuracy. Fine-tuning continued for up to 200 epochs on CIFAR-10 and 50 epochs for Tiny-Image Net, with early stopping after 30 epochs without improvement, using the same accuracy metrics as training from scratch.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running the experiments (e.g., GPU models, CPU models, or cloud computing instances with specifications).
Software Dependencies No The paper mentions using the Adam optimizer (Kingma, 2014) but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup Yes For the MLP5 architecture, we used a learning rate of 10 3 and regularization parameter λ=1. The Res Net-18 was trained with a learning rate of 10 3 and λ=0.75 (see Appx. A.2 for our hyperparameter selection). We modified the Res Net-18 architecture by replacing the input layer with a smaller 3 3 kernel and adapting the final layer for 10-class classification of both Res Net-18 and Tiny Vi T. Training proceeded for a maximum of 100 epochs with early stopping triggered after 10 epochs without improvement in validation performance. [...] We fine-tuned two Image Net-1k (Deng et al., 2009) pre-trained models on CIFAR10 and Tiny-Image Net: a Tiny Vi T (learning rate: 10 4, λ {1, 0.75, 0.5} depending on the bit) and a Res Net-18 (learning rate: 10 3, λ {1, 0.75, 0.5} depending on the bit).