Oscillations Make Neural Networks Robust to Quantization
Authors: Jonathan Wenshøj, Bob Pepin, Raghavendra Selvan
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on Res Net-18 and Tiny Vision Transformer, evaluated on CIFAR-10 and Tiny Image Net datasets, demonstrate across a range of quantization levels that training with oscillations followed by post-training quantization (PTQ) is sufficient to recover the performance of QAT in most cases. |
| Researcher Affiliation | Academia | Jonathan Wenshøj EMAIL Bob Pepin EMAIL Raghavendra Selvan EMAIL Department of Computer Science, University of Copenhagen, Denmark |
| Pseudocode | No | The paper provides mathematical derivations and describes mechanisms, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Source code is available at https://github.com/saintslab/osc_reg. |
| Open Datasets | Yes | We conducted our experiments using the CIFAR-10 (Krizhevsky & Hinton, 2009) and Tiny-Image Net (Le & Yang, 2015) datasets. |
| Dataset Splits | Yes | Training proceeded for a maximum of 100 epochs with early stopping triggered after 10 epochs without improvement in validation performance. For quantized models, we monitored the quantized validation accuracy at the target bit precision, while for the baseline, we tracked floating-point accuracy. Fine-tuning continued for up to 200 epochs on CIFAR-10 and 50 epochs for Tiny-Image Net, with early stopping after 30 epochs without improvement, using the same accuracy metrics as training from scratch. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware used for running the experiments (e.g., GPU models, CPU models, or cloud computing instances with specifications). |
| Software Dependencies | No | The paper mentions using the Adam optimizer (Kingma, 2014) but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | For the MLP5 architecture, we used a learning rate of 10 3 and regularization parameter λ=1. The Res Net-18 was trained with a learning rate of 10 3 and λ=0.75 (see Appx. A.2 for our hyperparameter selection). We modified the Res Net-18 architecture by replacing the input layer with a smaller 3 3 kernel and adapting the final layer for 10-class classification of both Res Net-18 and Tiny Vi T. Training proceeded for a maximum of 100 epochs with early stopping triggered after 10 epochs without improvement in validation performance. [...] We fine-tuned two Image Net-1k (Deng et al., 2009) pre-trained models on CIFAR10 and Tiny-Image Net: a Tiny Vi T (learning rate: 10 4, λ {1, 0.75, 0.5} depending on the bit) and a Res Net-18 (learning rate: 10 3, λ {1, 0.75, 0.5} depending on the bit). |