PARQ: Piecewise-Affine Regularized Quantization

Authors: Lisa Jin, Jianhao Ma, Zechun Liu, Andrey Gromov, Aaron Defazio, Lin Xiao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we conduct QAT experiments on low-bit quantization of convolution- and transformer-based vision models and demonstrate that PARQ obtains competitive performance compared to STE/Binary Connect, as well as other methods based on nonconvex regularization. We present numerical experiments to demonstrate that PARQ obtains competitive performance on convolution- and transformer-based vision tasks. Each entry in Tables 1–3 shows the mean and standard deviation of test accuracies over three randomly seeded runs.
Researcher Affiliation Collaboration 1Meta FAIR, United States. 2Dept. of Industrial and Operational Engineering, University of Michigan, Ann Arbor, MI, United States. 3Meta Reality Labs, United States. Correspondence to: Lisa Jin <EMAIL>, Lin Xiao <EMAIL>.
Pseudocode Yes Algorithm 1 PARQ
Open Source Code Yes Our implementation of PARQ in PyTorch is available at ht tps://github.com/facebookresearch/parq.
Open Datasets Yes We first evaluate quantized Res Net-20 and Res Net-56 (He et al., 2016) on CIFAR-10. For QAT of Res Net-50 (He et al., 2016) on Image Net, we quantize all residual block weights per channel by computing Q row-wise over tensors.
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or citations to specific split methodologies) for CIFAR-10 or ImageNet. It repeatedly refers to 'test accuracy' but does not detail how the datasets were partitioned for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running its experiments.
Software Dependencies No Our implementation of PARQ in PyTorch is available at ht tps://github.com/facebookresearch/parq. The paper mentions PyTorch but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We train for 200 epochs using SGD with 0.9 momentum and 2e 4 weight decay. Following Zhu et al. (2022), the 0.1 learning rate decays by a factor of 10 at epochs 80, 120, and 150. For QAT of Res Net-50 (He et al., 2016) on Image Net, we quantize all residual block weights per channel by computing Q row-wise over tensors. We use SGD with 0.1 learning rate, 0.9 momentum, and 1e 4 weight decay. The learning rate decays by a factor of 10 every 30 epochs. We use Adam W (Loshchilov & Hutter, 2018) to train for 300 epochs with a 5e 4 learning rate and 0.05 weight decay. We hold the learning rate at 1e 8 for the final 20 epochs (after PARQ and Binary Relax converge to hard-quantization).