Learning $k$-Level Structured Sparse Neural Networks Using Group Envelope Regularization
Authors: Yehonathan Refael, Iftach Arbel, Wasim Huleihel
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we experiment and illustrate the efficiency of our proposed method in terms of the compression ratio, accuracy, and inference latency. |
| Researcher Affiliation | Academia | Yehonathan Refael EMAIL Department of Electrical Engineering-Systems Tel Aviv University Iftach Arbel EMAIL Independent Researcher Wasim Huleihel EMAIL Department of Electrical Engineering-Systems Tel Aviv University |
| Pseudocode | Yes | Algorithm 1: General Stochastic Proximal Gradient Method Algorithm 2: Learning structured k-level sparse neural-network by Prox SGD with WGSEF regularization Algorithm 3: General Stochastic Proximal Gradient Method |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | These architectures were tested on the datasets CIFAR-10 Krizhevsky et al. (2009) and Fashion-MNIST Xiao et al. (2017). In this subsection, we compare our method to the state-of-the-art pruning techniques, which are often used as an alternative for model compression during (or, post) training. We train Resent50 with Image Net dataset. We examine the effectiveness of the WGSEF in the Le Net-5 convolutional neural network Le Cun et al. (1998) (the architecture is Pytorch and not Caffe and is given in Appendix A.5), on the MNIST dataset Le Cun & Cortes (2010). In Table 5, we present the results when training both VGG16 and Dense Net40 Huang et al. (2018a) on CIFAR-100 Krizhevsky et al. |
| Dataset Splits | No | The paper mentions using well-known datasets like CIFAR-10, Fashion-MNIST, ImageNet, MNIST, and CIFAR-100, which have standard splits. However, it does not explicitly state the specific split percentages, sample counts, or a detailed methodology for how the data was partitioned for its experiments. For instance, it refers to 'validation datasets' but not their size or origin. |
| Hardware Specification | Yes | Experiments were conducted using a mini-batch size of b = 128 on an A100 GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch' in the context of the LeNet-5 architecture but does not specify a version number for PyTorch or any other software libraries or solvers used in the experiments. |
| Experiment Setup | Yes | All experiments were conducted over 300 epochs. For the first 150 epochs, we employed Algorithm 2, and for the leftover epochs, we used the HSPG with the WGSEF acting as a regularizer (i.e., Algorithm 3). Experiments were conducted using a mini-batch size of b = 128 on an A100 GPU. The coefficient for the WGSE regularizer was set to λ = 10 2. Again, to have a fair comparison, the baseline model was trained using SGD, both with an initial learning rate of α0 = 0.01, regularization magnitude λ = 0.03, a batch size of 128, and a cosine annealing learning rate scheduler. We train Resent50 with Image Net dataset, using λ = 0.05, with an initial learning rate of α0 = 0.01, sparsity level k = 0.34, and use a cosine annealing learning rate scheduler. The networks were trained with a learning rate of 0.001, regularization magnitude λ = 10 5, and a batch size of 32 for 150 epochs across 5 runs. We use a learning rate equal to 1e 4, with a batch size of 32, a momentum 0.95, and 15 epochs. |