STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Authors: Peijie Dong, Lujun Li, Yuedong Zhong, DaYou Du, Ruibo FAN, Yuhan CHEN, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on LLa MA, OPT, and Mistral family. STBLLM achieves a perplexity of 11.07 at 0.55 bits per weight, outperforming the Bi LLM by 3 . |
| Researcher Affiliation | Academia | 1 HKUST(GZ) 2 HKUST 3 SYSU 4 HIT(SZ) EMAIL, EMAIL, EMAIL, EMAIL, EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Framework of STBLLM: Details of each function are shown in Algorithm 2. Algorithm 2 STBLLM |
| Open Source Code | Yes | Code is released at https://github.com/pprp/STBLLM. |
| Open Datasets | Yes | We measure the perplexity for language generation tasks on Wikitext2 (Merity et al., 2016), C4 (Raffel et al., 2020) and PTB (Marcus et al., 1993), and accuracy for the zero-shot tasks including Winogrande (Sakaguchi et al., 2021), OBQA (Mihaylov et al., 2018), Hellaswag (Zellers et al., 2019), Bool Q (Clark et al., 2019), ARC (Clark et al., 2018) and RTE (Chakrabarty et al., 2021). |
| Dataset Splits | Yes | For perplexity evaluation in Table 2 and 3, we employ the C4 dataset as the calibration dataset and report the perplexity on Wikitext2. We conduct experiments on LLa MA-1/2/3 (Touvron et al., 2023a;b), OPT (Zhang et al., 2022a), and Mistral (Jiang et al., 2023). ... We extend our experiments to 7 zero-shot datasets on LLa MA-1-13B, LLa MA-2-13B, and LLa MA-1-30B, each tested with Full Precision, Bi LLM(6:8), Bi LLM(4:8), STBLLM(6:8), and STBLLM(4:8) methods. |
| Hardware Specification | Yes | Most LLMs except 65B can be evaluated on a single NVIDIA A800 GPU. For the LLa MA-1-65B model, we employ four NVIDIA A800 GPUs for evaluation. It takes 1.8 hours for the post-training process of 7B models on an RTX 4090 GPU and 2.8 hours for 13B models on an A6000 GPU. |
| Software Dependencies | No | Our STBLLM utilizes Py Torch (Paszke et al., 2019) and Huggingface (Wolf et al., 2019) libraries. |
| Experiment Setup | Yes | For a fair comparison, we set the same block size to 128. ... We compare the results of STBLLM with Bi LLM under the same N:M settings. For more information on average bits under N:M settings, please refer to Table 1. ... We evaluate the perplexity of LLa MA-1-7B and LLa MA-2-7B with group sizes of 64, 128, 256, and 512. Generally, as the group size increases, performance improves. However, this also results in higher computational and storage demands. We choose a group size of 128 to balance performance and resource consumption. |