Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

Authors: Lianbo Ma, Jianlun Ma, Yuee Zhou, Guoyang Xie, Qiang He, Zhichao Lu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both theoretical analysis and experimental results validate our approach. Using the CIFAR10 dataset (just 0.5% the size of Image Net training data) for MPQ policy search, we achieved equivalent accuracy on Image Net with a significantly lower computational cost, while improving efficiency by up to 150% over the baselines. The paper includes a dedicated '3. Experiments' section with '3.2. Comparison with State-Of-The-Art' and '3.3. Ablation Study', featuring tables of accuracy and efficiency, thus indicating an experimental research type.
Researcher Affiliation Collaboration The affiliations listed include '1College of Software , Northeastern University, Shenyang, China', '2The Department of Computer Science, City University of Hong Kong, Hong Kong, China', which are academic institutions, and '3The Department of Intelligent Manufacturing, CATL, Ningde, China', which is an industry affiliation. This mix indicates a collaboration between academia and industry.
Pseudocode No The paper describes the methodology using mathematical formulations and textual descriptions across sections like '2. Approach', '2.2. Exploiting the Loss-Sharpness Information', and '2.3. Generalizable MPQ via Adaptive Sharpness-Aware Gradient Aligning', but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes The proxy datasets Dtrain proxy for MPQ search include CIFAR10 (Krizhevsky et al., 2009), Flowers (Nilsback & Zisserman, 2008), and Food (Bossard et al., 2014). For image classification, the target large dataset Dval for model inference is Image Net (Deng et al., 2009). For object detection, the target dataset is VOC (Everingham et al., 2010).
Dataset Splits Yes For image classification, the target large dataset Dval for model inference is Image Net (Deng et al., 2009) with 1000 categories, containing 1.28M training samples and 50K validation samples. For object detection, the target dataset is VOC (Everingham et al., 2010) with 20 categories, containing about 1.6K training samples and 5K validation samples.
Hardware Specification No The paper mentions '72 GPU hours' as a measure of computational cost but does not provide specific details about the GPU models, CPU types, or other hardware specifications used for the experiments.
Software Dependencies No The paper mentions using 'SGD as base optimizer' and 'SDG optimizer with Nesterov momentum' in Supplementary Material A.5, but does not provide specific software names or version numbers for programming languages, libraries, or frameworks used in the implementation.
Experiment Setup Yes For policy searching, we adopt the SGD as base optimizer, and the initial learning rate is set to 0.01 for 90 epochs. Empirically, we find the sharpness of loss landscapes is not sensitive to the hyper-parameter and thus set ϵ = 0.1 for all proxy datasets. We use the full-precision model (trained on Dtrain) as the initialization and adopt the SDG optimizer with Nesterov momentum (Sutskever et al., 2013) and the initial learning rate is set to 0.04. We use the cosine learning rate scheduler and finetune the model until convergence and the first 5 finetune-epochs are used as warm-up.