Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition

Authors: Ismail Alkhouri, Xitong Zhang, Rongrong Wang

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results (i) demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and Image Net on Convolutional Neural Networks, and (ii) illustrate that we achieve either competitive or state-of-the-art results when compared to leading structured pruning and low-rank training methods in terms of FLOPs and parameters drop.
Researcher Affiliation Academia Ismail R. Alkhouri EMAIL; EMAIL Department of Computational Mathematics, Science & Engineering Michigan State University Department of Electrical Engineering & Computer Science University of Michigan Ann Arbor Xitong Zhang EMAIL Department of Computational Mathematics, Science & Engineering Michigan State University Rongrong Wang EMAIL Department of Computational Mathematics, Science & Engineering Department of Mathematics Michigan State University
Pseudocode Yes Algorithm 1 Compression with Lo RITa+SVT. Input: L trainable weights Wi, i [L], factorization parameter N > 1, and singular value truncation parameter r. Output: Compressed and trained Weights.
Open Source Code Yes Our code is available at https://github.com/Xitong System/Lo RITa/tree/main.
Open Datasets Yes Our experimental results (i) demonstrate the effectiveness of our approach using MNIST on Fully Connected Networks, CIFAR10 on Vision Transformers, and CIFAR10/100 and Image Net on Convolutional Neural Networks...
Dataset Splits No The paper mentions using well-known datasets like MNIST, CIFAR10, CIFAR100, and Image Net but does not explicitly provide specific training/validation/test splits, percentages, or methodology for reproducing the data partitioning for the main experiments. It mentions '120 randomly subsampled training data to compute E(l)' in Appendix A, but this is for an internal iterative process, not the overall dataset split for model training and evaluation.
Hardware Specification No The paper states, 'We use Py Torch to conduct our experiments,' but does not provide any specific details regarding the hardware used for these experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions 'We use Py Torch to conduct our experiments,' but it does not specify the version number of PyTorch or any other software dependencies required to reproduce the experimental setup.
Experiment Setup Yes First, we evaluate our proposed method on fully connected neural networks, varying the number of layers, utilizing the Adam optimizer with a learning rate set to 1 10 2, and employing a constant layer dimension of 96 (other than the last). Overparameterization is applied across all layers in the model. To ensure a fair comparison, we begin by tuning the baseline model (N = 1) across a range of weight decay parameters {5 10 6, 1 10 5, 2 10 4, 5 10 5, 1 10 4, 2 10 4}. Subsequently, we extend our exploration of weight decay within the same parameter range for models with N > 1. ... The learning rate applied in this evaluation is set to 3 10 4. The weight decay was searched over {1 10 2, 5 10 3, 1 10 3} for CIFAR10 and {1 10 5, 5 10 5, 1 10 4} for CIFAR100. ... All the considered Vi T models underwent optimization via the Adam optimizer with a learning rate of 3 10 4. The hidden dimension is 256 for all Vi Ts. ... we initially fine-tuned the baseline model (N = 1) across the following weight decay parameters {5 10 5, 1 10 4, 2 10 4, 5 10 4, 1 10 3, 2 10 3}.