MCNC: Manifold-Constrained Reparameterization for Neural Compression

Authors: Chayne Thrash, Reed Andreas, Ali Abbasi, Parsa Nooralinejad, Soroush Abbasi Koohpayegani, Hamed Pirsiavash, Soheil Kolouri

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments in computer vision and natural language processing tasks, we demonstrate that our method significantly outperforms state-of-the-art baselines in terms of compression, accuracy, and/or model reconstruction time.
Researcher Affiliation Academia 1Department of Computer Science, Vanderbilt University, Nashville, TN, 37235 2Department of Computer Science, University of California, Davis, CA, 95616 chayne.thrash,ali.abbasi,reed.w.andreas,EMAIL pnoorali,soroush,EMAIL
Pseudocode Yes A.1 EXAMPLE CODE FOR APPLYING MCNC Below, we include Py Torch code which shows how to create a linear layer which is reparameterized using MCNC. 1 import torch 2 import torch.nn as nn 3 import torch.nn.functional as F 4 from math import ceil 6 class MCNC_Linear(nn.Linear):
Open Source Code Yes Our code is publicly available at https://github.com/mint-vu/MCNC.
Open Datasets Yes We begin by evaluating on training Vision Transformers from scratch on the Image Net-100 dataset (Tian et al., 2020). CIFAR-10 and CIFAR-100. We compare MCNC on CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) We conducted fine-tuning experiments on two variants of the LLa MA-2 (Touvron et al., 2023) language model, LLa MA 7B, and LLa MA 13B, using the Alpaca dataset as our training data (Taori et al., 2023).
Dataset Splits Yes We begin by evaluating on training Vision Transformers from scratch on the Image Net-100 dataset (Tian et al., 2020). CIFAR-10 and CIFAR-100. We compare MCNC on CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) We conducted fine-tuning experiments on two variants of the LLa MA-2 (Touvron et al., 2023) language model, LLa MA 7B, and LLa MA 13B, using the Alpaca dataset as our training data (Taori et al., 2023). We report both training loss and validation loss on the Alpaca dataset.
Hardware Specification Yes We use a single RTX 3090 GPU to calculate the throughput. We perform this experiment 100 times using an RTX A6000 and report the average timings in Table 8.
Software Dependencies No We use the training code from QLo RA (Dettmers et al., 2023) and NOLA (Koohpayegani et al., 2024) for our experiments.
Experiment Setup Yes We provide hyperparameters used for our method and baselines in section A.3. Implementation Details: We use the training code from QLo RA (Dettmers et al., 2023) and NOLA (Koohpayegani et al., 2024) for our experiments. We quantize the original parameters of the language model to 4-bit and apply and fine-tune the adapter on all layers of the transformer. For NOLA, we followed the hyperparameters reported in (Koohpayegani et al., 2024). We set the rank to 8 for both NOLA and our method in LLa MA 7B, and to 16 in LLa MA 13B. In our method, we use a generator with the following specifications: a 3-layer MLP with an input dimension of 5, an output dimension of 5000, and a hidden dimension of 32. In NOLA, we use 64 bases for A and B in LLa MA 7B and 140 bases for LLa MA 13B. These numbers of bases result in the same number of optimized parameters as our method for each architecture. All other hyperparameters were identical for both methods except the learning rate; we used lr = 0.001 for NOLA and 0.01 for ours.