reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Authors: Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Qinshuo Liu, Xianglong Liu, Luca Benini, Michele Magno, Shiming Zhang, Xiaojuan Qi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that Sli M-LLM achieves superior performance across various LLMs at low bit-widths. For example, a 2-bit quantized LLa MA-7B model reduces memory usage by nearly 6x compared to the floating-point baseline, decreases perplexity by 48% compared to state-of-the-art gradient-free PTQ methods, and maintains GPU inference speed.
Researcher Affiliation	Academia	1The University of Hong Kong 2ETH Zürich 3Beihang University. Correspondence to: Haotong Qin, Shiming Zhang, Xiaojuan Qi <EMAIL, EMAIL, EMAIL>.
Pseudocode	Yes	Algorithm 1 Main Framework of Sli M-LLM. func Sli M-LLM(w, x F , β, λ, N) ... Algorithm 2 Detailed functions in Sli M-LLM. func SBA(w, x F , Hin, β, N)
Open Source Code	Yes	Our code is available at https://github.com/Aaronhuang-778/Sli M-LLM.
Open Datasets	Yes	Experiments are carried out on the Wiki Text2 (Merity et al., 2016) and C4 (Raffel et al., 2020)datasets.
Dataset Splits	Yes	We randomly select 128 samples from Wiki Text2 (Merity et al., 2016) as calibration data, each with 2048 tokens.
Hardware Specification	Yes	the quantization is carried out on a single NVIDIA A800 GPU. For Sli M-LLM+, we employ the Adam W optimizer, following Omni Quant (Shao et al., 2023), which is also feasible on a single A800.
Software Dependencies	No	The paper mentions "open-source Auto GPTQ" for extending CUDA kernel, but does not specify a version for Auto GPTQ or CUDA, or any other software component.
Experiment Setup	Yes	Perchannel group quantization is utilized in our framework with groupsize = 128 in experiments. Since no backpropagation in Sli M-LLM, the quantization is carried out on a single NVIDIA A800 GPU. For Sli M-LLM+, we employ the Adam W optimizer, following Omni Quant (Shao et al., 2023)... We randomly select 128 samples from Wiki Text2 (Merity et al., 2016) as calibration data, each with 2048 tokens... We empirically set λ at 0.1 and n at 50 to achieve a balance between efficiency and accuracy.