AP: Selective Activation for De-sparsifying Pruned Networks

Authors: Shiyu Liu, Rohan Ghosh, Mehul Motani

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments using various popular networks (e.g., Res Net, VGG, Dense Net, Mobile Net) via two classical and three competitive pruning methods. The experimental results on public datasets (e.g., CIFAR-10, CIFAR-100) suggest that AP works well with existing pruning methods and improves the performance by 3% 4%. For larger scale datasets (e.g., Image Net) and competitive networks (e.g., vision transformer), we observe an improvement of 2% 3% with AP as opposed to without. Lastly, we conduct an ablation study and a substitution study to examine the effectiveness of the components comprising AP.
Researcher Affiliation Academia Shiyu Liu EMAIL Department of Electrical and Computer Engineering College of Design and Engineering National University of Singapore Rohan Ghosh EMAIL Department of Electrical and Comep, and Mehul Motani EMAIL Department of Electrical and Computer Engineering College of Design and Engineering N.1 Institute for Health Institute of Data Science Institute for Digital Medicine (Wis DM) National University of Singapore
Pseudocode Yes Algorithm 1 The Pruning Metric of the Proposed AP Algorithm 2 The Pruning Method X with and without AP
Open Source Code Yes The source code is available at https://github.com/ Martin1937/Activate-While-Pruning.
Open Datasets Yes We conduct experiments on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) with various popular networks (e.g., Res Net, VGG (Simonyan & Zisserman, 2014), Mobile Net (Sandler et al., 2018), Dense Net (Huang et al., 2017)) using two classical and three competitive pruning methods. The results demonstrate that AP works well with existing pruning methods and improve their performance by 3% 4%. For the larger scale dataset (e.g., Image Net (Deng et al., 2009)) and competitive networks (e.g., vision transformer (Dosovitskiy et al., 2020)), we observe an improvement of 2% 3% with AP as opposed to without.
Dataset Splits Yes We conduct pruning experiments using Res Net-20 on the CIFAR-10 dataset with the aim of examining the benefit (or lack thereof) of Re LU s sparsity for pruned networks. ... To ensure fair comparison against prior results, we utilize standard implementations (i.e., network hyper-parameters and learning rate schedules) reported in the literature. Specifically, the implementations for Tables 1 6 are from (Frankle & Carbin, 2019), (Zhao et al., 2019), (Chin et al., 2020), (Renda et al., 2019) and (Dosovitskiy et al., 2020).
Hardware Specification Yes We use Tesla V100 devices to conduct our experiments.
Software Dependencies No The paper mentions using SGD optimizer, He initialization, and refers to implementations reported in the literature, but does not provide specific software library version numbers.
Experiment Setup Yes We train the network using SGD with He initialization (He et al., 2015), momentum = 0.9 and a weight decay of 1e-4 (same as (Renda et al., 2019; Frankle & Carbin, 2019)). For the benchmark pruning method, we prune the network with a pruning rate p = 20 (i.e., 20% of existing weights are pruned) in 1 pruning cycle. ... (i) The training batch size is tuned from {64, 128, ...., 1024}. (ii) The learning rate is tuned from 1e-3 to 1e-1 via a stepsize of 2e-3. (iii) The number training epochs is tuned from 80 to 500 with a stepsize of 20.