STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Authors: Peijie Dong, Lujun Li, Yuedong Zhong, DaYou Du, Ruibo FAN, Yuhan CHEN, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on LLa MA, OPT, and Mistral family. STBLLM achieves a perplexity of 11.07 at 0.55 bits per weight, outperforming the Bi LLM by 3 .
Researcher Affiliation Academia 1 HKUST(GZ) 2 HKUST 3 SYSU 4 HIT(SZ) EMAIL, EMAIL, EMAIL, EMAIL, EMAIL EMAIL
Pseudocode Yes Algorithm 1 Framework of STBLLM: Details of each function are shown in Algorithm 2. Algorithm 2 STBLLM
Open Source Code Yes Code is released at https://github.com/pprp/STBLLM.
Open Datasets Yes We measure the perplexity for language generation tasks on Wikitext2 (Merity et al., 2016), C4 (Raffel et al., 2020) and PTB (Marcus et al., 1993), and accuracy for the zero-shot tasks including Winogrande (Sakaguchi et al., 2021), OBQA (Mihaylov et al., 2018), Hellaswag (Zellers et al., 2019), Bool Q (Clark et al., 2019), ARC (Clark et al., 2018) and RTE (Chakrabarty et al., 2021).
Dataset Splits Yes For perplexity evaluation in Table 2 and 3, we employ the C4 dataset as the calibration dataset and report the perplexity on Wikitext2. We conduct experiments on LLa MA-1/2/3 (Touvron et al., 2023a;b), OPT (Zhang et al., 2022a), and Mistral (Jiang et al., 2023). ... We extend our experiments to 7 zero-shot datasets on LLa MA-1-13B, LLa MA-2-13B, and LLa MA-1-30B, each tested with Full Precision, Bi LLM(6:8), Bi LLM(4:8), STBLLM(6:8), and STBLLM(4:8) methods.
Hardware Specification Yes Most LLMs except 65B can be evaluated on a single NVIDIA A800 GPU. For the LLa MA-1-65B model, we employ four NVIDIA A800 GPUs for evaluation. It takes 1.8 hours for the post-training process of 7B models on an RTX 4090 GPU and 2.8 hours for 13B models on an A6000 GPU.
Software Dependencies No Our STBLLM utilizes Py Torch (Paszke et al., 2019) and Huggingface (Wolf et al., 2019) libraries.
Experiment Setup Yes For a fair comparison, we set the same block size to 128. ... We compare the results of STBLLM with Bi LLM under the same N:M settings. For more information on average bits under N:M settings, please refer to Table 1. ... We evaluate the perplexity of LLa MA-1-7B and LLa MA-2-7B with group sizes of 64, 128, 256, and 512. Generally, as the group size increases, performance improves. However, this also results in higher computational and storage demands. We choose a group size of 128 to balance performance and resource consumption.