Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training

Authors: Boyu Liu, Haoyu Huang, Linlin Yang, Yanjing Li, Guodong Guo, Xianbin Cao, Baochang Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments in two co-training scenarios demonstrate the effectiveness and versatility of TSQ-MTC. In particular, we successfully achieve a 4-bit quantized low-level visual foundation model based on IPT, which attains a PSNR comparable to the full-precision model while offering a 7.99 compression ratio in the 4 super-resolution task on the Set5 benchmark.
Researcher Affiliation Academia 1Institute of Artificial Intelligence, Beihang University 2National Superior College for Engineers, Beihang University 3State Key Laboratory of Media Convergence and Communication, Communication University of China 4School of Electronic Information Engineering, Beihang University 5Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 6Hangzhou Innovation Institute, Beihang University 7Zhongguancun Laboratory 8Nanchang Institute of Technology
Pseudocode Yes Algorithm 1 Task-Specific Scales Quantization for Multi-Task Co-Training with TLMAQ
Open Source Code No The paper does not provide a direct link to a code repository, an explicit statement of code release, or mention code in supplementary materials for the methodology described.
Open Datasets Yes Datasets. We evaluate the super-resolution performace on Set5 (Bevilacqua et al., 2012), Set14 (Zeyde et al., 2012), B100 (Martin et al., 2001), Urban100 (Huang et al., 2015). For the denoising task, we adopt CBSD68 (Martin et al., 2001) and Urban100. As for deraining, we use Rain100L (Yang et al., 2017). For all benchmarks, the performances are measured by PSNR. Datasets. For experiments on multi-modal data co-training, we utilize a SAR-RGB dataset collected from open-source datasets. Details of the dataset are provided in Appendix F. TSQ-MTC on IPT. ...co-training for 50 epochs on the Image Net (Deng et al., 2009) dataset... for each super-resolution task, we perform 30 epochs of single-task fine-tuning on the DIV2K dataset (Timofte et al., 2017) respectively...
Dataset Splits Yes TSQ-MTC on IPT. Py Torch is used to implement both our baselines and TSQ-MTC. The training is conducted on NVIDIA Tesla A100 GPUs, each with 80 GB memory, using the Adam optimizer with β1 = 0.9 and β2 = 0.999. We initialize all the quantized models from full-precision pre-trained weights and perform co-training for 50 epochs on the Image Net (Deng et al., 2009) dataset. The learning rate starts at 5 10 5 and decays to 2 10 5 throughout the training, with a batch size of 225. SSIM is the loss for SLLD, with a weight coefficient of 0.01. To ensure a fair comparison with the full-precision IPT model, for each super-resolution task, we perform 30 epochs of single-task fine-tuning on the DIV2K dataset (Timofte et al., 2017) respectively, with a learning rate of 1e-6.
Hardware Specification Yes TSQ-MTC on IPT. Py Torch is used to implement both our baselines and TSQ-MTC. The training is conducted on NVIDIA Tesla A100 GPUs, each with 80 GB memory, using the Adam optimizer with β1 = 0.9 and β2 = 0.999. TSQ-MTC on CNNs. ...The training is conducted on 2 NVIDIA Tesla A5000 GPUs using the Adam optimizer, initialized with full-precision weights.
Software Dependencies No Py Torch is used to implement both our baselines and TSQ-MTC. The training is conducted on NVIDIA Tesla A100 GPUs, each with 80 GB memory, using the Adam optimizer with β1 = 0.9 and β2 = 0.999. TSQ-MTC on CNNs. ...The training is conducted on 2 NVIDIA Tesla A5000 GPUs using the Adam optimizer, initialized with full-precision weights.
Experiment Setup Yes TSQ-MTC on IPT. Py Torch is used to implement both our baselines and TSQ-MTC. The training is conducted on NVIDIA Tesla A100 GPUs, each with 80 GB memory, using the Adam optimizer with β1 = 0.9 and β2 = 0.999. We initialize all the quantized models from full-precision pre-trained weights and perform co-training for 50 epochs on the Image Net (Deng et al., 2009) dataset. The learning rate starts at 5 10 5 and decays to 2 10 5 throughout the training, with a batch size of 225. SSIM is the loss for SLLD, with a weight coefficient of 0.01. To ensure a fair comparison with the full-precision IPT model, for each super-resolution task, we perform 30 epochs of single-task fine-tuning on the DIV2K dataset (Timofte et al., 2017) respectively, with a learning rate of 1e-6. TSQ-MTC on CNNs. We use Res Net-18, Res Net-34, Res Net-50, and Res Net-101 as sharedparameter backbones for multi-modal data co-training... The training is conducted on 2 NVIDIA Tesla A5000 GPUs using the Adam optimizer, initialized with full-precision weights. We do co-training for 50 epochs on our SAR-RGB dataset with an initial learning rate of 1 10 5 multiplied by 0.1 at epoch 25.