Max-Affine Spline Insights Into Deep Network Pruning

Authors: Haoran You, Randall Balestriero, Zhihan Lu, Yutong Kou, Huihong Shi, Shunyao Zhang, Shang Wu, Yingyan Lin, Richard Baraniuk

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four networks and three datasets validate that our new spline-based DN pruning approach reduces training FLOPs by up to 3.5 while achieving similar or even better accuracy than current state-of-the-art methods. Code is available at https://github.com/RICE-EIC/Spline-EB.
Researcher Affiliation Collaboration Rice University Meta AI Research Huazhong University of Science and Technology Nanjing University
Pseudocode Yes A Algorithm for searching Spline EB tickets Algorithm 1: The Algorithm for Searching Spline EB Tickets
Open Source Code Yes Code is available at https://github.com/RICE-EIC/Spline-EB.
Open Datasets Yes Models & Datasets: We consider four DNN models (Pre Res Net-101, VGG-16, and Res Net-18/50) on both the CIFAR-10/100 and Image Net datasets following the basic setting of (You et al., 2020).
Dataset Splits Yes Models & Datasets: We consider four DNN models (Pre Res Net-101, VGG-16, and Res Net-18/50) on both the CIFAR-10/100 and Image Net datasets following the basic setting of (You et al., 2020).
Hardware Specification Yes All experiments are run in a server with ten NVIDIA 2080 Ti GPUs. measured by training the models on an edge GPU (NVIDIA JETSON TX2)
Software Dependencies No The paper mentions using an SGD solver with specific parameters (momentum, weight decay) but does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup Yes For the CIFAR-10/100 datasets, the training takes a total of 160 epochs; and the initial learning rate is set to 0.1 and is divided by 10 at the 80-th and 120-th epochs, respectively. For the Image Net dataset, the training takes a total of 90 epochs while the learning rate drops at the 30-th and 60-th epochs, respectively. In all the experiments, the batch size is set to 256, and an SGD solver is adopted with a momentum of 0.9 and a weight decay of 0.0001, following the setting of (Liu et al., 2019c). Additionally, ρ in Equ. 2 is set to 0.05 for all experiments except for the ablation studies.