Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

Authors: Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments demonstrate that Basis Sharing outperforms state-of-the-art SVD-based compression approaches and parameter sharing techniques, especially under large compression ratios. [...] We conduct extensive experiments on a variety of LLMs, including the LLa MA family (Touvron et al., 2023a;b), OPT-6.7B (Zhang et al., 2022), Mistral-7B (Jiang et al., 2023a), and GPT-2 (Radford et al., 2019). Our Basis Sharing can surpasses the state-of-the-art SVD-based methods in both generation tasks and downstream reasoning tasks without any fine-tuning under compression ratios from 20% to 50%. Specifically, compared with state-of-the-art SVD-based compression approaches, Basis Sharing can further reduce the perplexity by up to 25% on generation tasks and improve accuracy by up to 4% on downstream reasoning tasks under the same compression ratio. [...] 4 EXPERIMENTS
Researcher Affiliation Academia Jingcun Wang Technical University of Darmstadt EMAIL Yu-Guang Chen National Central University EMAIL Ing-Chao Lin National Cheng Kung University EMAIL Bing Li University of Siegen EMAIL Grace Li Zhang Technical University of Darmstadt EMAIL
Pseudocode No The paper describes the methodology using mathematical equations and descriptive text, accompanied by figures. It does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a link to a code repository. It mentions using existing frameworks like 'LM-Evaluation-Harness framework (Gao et al., 2024)' and 'Hugging Face PEFT', but this refers to third-party tools, not the authors' own implementation.
Open Datasets Yes Three language modeling datasets used in our experiment include Wiki Text-2 (Merity et al., 2016), PTB (Marcus et al., 1993) and C4 (Raffel et al., 2019). Seven reasoning datasets used in the experiments include Openbook QA (Banerjee et al., 2020), Wino Grande (Sakaguchi et al., 2021) Hella Swag (Zellers et al., 2019), PIQA (Bisk et al., 2020), Math QA (Amini et al., 2019), ARC-e, ARC-c (Clark et al., 2018). All the reasoning tasks are tested in zero-shot setting with the implementation of LM-Evaluation-Harness framework (Gao et al., 2024). [...] For reasoning tasks, the S of the results outside the bracket is evaluated with Wiki Text-2, while inside is evaluated with Alpaca.
Dataset Splits No The paper mentions using "256 samples from Wiki Text-2 (Merity et al., 2016) with each 2048 tokens to evaluate X" and "Each model is fine tuned with Wiki Text-2 training dataset for two epochs". While it references a training dataset and sample counts for a specific evaluation, it does not provide explicit training/validation/test splits (e.g., percentages or absolute counts for all splits) for the general experimental setup that would be necessary for reproduction.
Hardware Specification Yes All experiments are tested on two NVIDIA A100 80GB GPUs. [...] out of memory error occurs on an A100 GPU. [...] we compared the performance of LLa MA-7B with and without Basis Sharing on a single A100 GPU
Software Dependencies No The paper mentions "Hugging Face", "Hugging Face PEFT", and "LM-Evaluation-Harness framework (Gao et al., 2024)" as tools used. However, it does not provide specific version numbers for these software components, nor for programming languages or other libraries, which are necessary for reproducible software dependencies.
Experiment Setup Yes S is derived through 256 samples from Wiki Text-2 with 2048 sequence length. [...] We used lora_r = 8, lora_alpha = 32, and learning_rate = 1e-4, and used defaults for all other hyperparameters in the Hugging Face PEFT. Each model is fine tuned with Wiki Text-2 training dataset for two epochs. [...] The differences from Lo RA fine-tuning are that we use here learning_rate = 2e-6 and two A100 GPUs.