PARQ: Piecewise-Affine Regularized Quantization
Authors: Lisa Jin, Jianhao Ma, Zechun Liu, Andrey Gromov, Aaron Defazio, Lin Xiao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we conduct QAT experiments on low-bit quantization of convolution- and transformer-based vision models and demonstrate that PARQ obtains competitive performance compared to STE/Binary Connect, as well as other methods based on nonconvex regularization. We present numerical experiments to demonstrate that PARQ obtains competitive performance on convolution- and transformer-based vision tasks. Each entry in Tables 1–3 shows the mean and standard deviation of test accuracies over three randomly seeded runs. |
| Researcher Affiliation | Collaboration | 1Meta FAIR, United States. 2Dept. of Industrial and Operational Engineering, University of Michigan, Ann Arbor, MI, United States. 3Meta Reality Labs, United States. Correspondence to: Lisa Jin <EMAIL>, Lin Xiao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 PARQ |
| Open Source Code | Yes | Our implementation of PARQ in PyTorch is available at ht tps://github.com/facebookresearch/parq. |
| Open Datasets | Yes | We first evaluate quantized Res Net-20 and Res Net-56 (He et al., 2016) on CIFAR-10. For QAT of Res Net-50 (He et al., 2016) on Image Net, we quantize all residual block weights per channel by computing Q row-wise over tensors. |
| Dataset Splits | No | The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or citations to specific split methodologies) for CIFAR-10 or ImageNet. It repeatedly refers to 'test accuracy' but does not detail how the datasets were partitioned for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running its experiments. |
| Software Dependencies | No | Our implementation of PARQ in PyTorch is available at ht tps://github.com/facebookresearch/parq. The paper mentions PyTorch but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We train for 200 epochs using SGD with 0.9 momentum and 2e 4 weight decay. Following Zhu et al. (2022), the 0.1 learning rate decays by a factor of 10 at epochs 80, 120, and 150. For QAT of Res Net-50 (He et al., 2016) on Image Net, we quantize all residual block weights per channel by computing Q row-wise over tensors. We use SGD with 0.1 learning rate, 0.9 momentum, and 1e 4 weight decay. The learning rate decays by a factor of 10 every 30 epochs. We use Adam W (Loshchilov & Hutter, 2018) to train for 300 epochs with a 5e 4 learning rate and 0.05 weight decay. We hold the learning rate at 1e 8 for the final 20 epochs (after PARQ and Binary Relax converge to hard-quantization). |