ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
Authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform comprehensive experiments for Com PEFT to evaluate: (1) the performance of the compressed model on its original tasks, (2) the number of bits needed to store the models, (3) the mergeability and composability of the compressed checkpoints, and (4) how Com PEFT compares to other existing PEFT methods. |
| Researcher Affiliation | Collaboration | 1 UNC-Chapel Hill, 2 MIT, 3 MIT-IBM Watson AI Lab, 4 University of Toronto, 5 Vector Institue Correspondence Email: {EMAIL} |
| Pseudocode | Yes | Algorithm 1 Com PEFT Compression Procedure. Input: Task vector τt, k, and a scaling value α. Output: Compressed task vector τt |
| Open Source Code | Yes | 1Code is available at https://github.com/prateeky2806/Com PEFT. |
| Open Datasets | Yes | We follow the experimental setting from the QLo RA paper (Dettmers et al., 2023) and experiment with 8 recent instruction-following datasets that are diverse in terms of languages and dataset sizes. This collection includes datasets generated by language models (Alpaca (Taori et al., 2023), self-instruct (Wang et al., 2022), and unnatural-instructions (Honovich et al., 2022)), a multitask dataset (FLAN-v2 (Chung et al., 2022a)), two datasets created via human annotation and feedback (OASST1 (Köpf et al., 2023) and HH-RLHF (Bai et al., 2022)), and two hybrid datasets (Chip2 (LAION, 2023) and Longform (Köksal et al., 2023)). |
| Dataset Splits | Yes | For the experiments in 3.2 and 3.3 on the 7 GLUE (Wang et al., 2018a) tasks, we trained the large datasets (mnli, qnli, sst2, qqp) for 1 epoch and the small datasets (rte, mrpc, wnli) for 10 epochs. Whereas for the experiment in 3.5, we followed most of the hyperparameter configuration from the (IA)3 (Liu et al., 2022) paper and trained for 2500 steps with a batch size of 8. For each of the 11 datasets in 3.5, we selected 200 examples from the training set to be used as the validation set for best model selection as well as selecting the hyperparameters for Com PEFT. |
| Hardware Specification | Yes | We used a single 48GB NVIDIA A6000 GPU for these experiments. |
| Software Dependencies | No | The paper mentions software components like bfloat16 (data type) and refers to using code from original authors for merging methods, but does not specify version numbers for any key software libraries or packages used for their own implementation. |
| Experiment Setup | Yes | In all experiments, we sweep both α and k in the following ranges, k {5, 10, 20, 30, 50} and α {0.5, 1, 2, 3, 4, 5, 6, 8, 10}... For training (IA)3 models we selected the learning rate from {1e 2, 1e 3, 1e 4, 1e 5}, for Lo RA from {5e 2, 5e 3, 5e 4, 5e 5}, and for full model finetuning from {5e 3, 5e 4, 5e 5, 5e 6}. During the training process, bfloat16 was adopted to curtail GPU memory expenditure. |