Max-Affine Spline Insights Into Deep Network Pruning
Authors: Haoran You, Randall Balestriero, Zhihan Lu, Yutong Kou, Huihong Shi, Shunyao Zhang, Shang Wu, Yingyan Lin, Richard Baraniuk
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four networks and three datasets validate that our new spline-based DN pruning approach reduces training FLOPs by up to 3.5 while achieving similar or even better accuracy than current state-of-the-art methods. Code is available at https://github.com/RICE-EIC/Spline-EB. |
| Researcher Affiliation | Collaboration | Rice University Meta AI Research Huazhong University of Science and Technology Nanjing University |
| Pseudocode | Yes | A Algorithm for searching Spline EB tickets Algorithm 1: The Algorithm for Searching Spline EB Tickets |
| Open Source Code | Yes | Code is available at https://github.com/RICE-EIC/Spline-EB. |
| Open Datasets | Yes | Models & Datasets: We consider four DNN models (Pre Res Net-101, VGG-16, and Res Net-18/50) on both the CIFAR-10/100 and Image Net datasets following the basic setting of (You et al., 2020). |
| Dataset Splits | Yes | Models & Datasets: We consider four DNN models (Pre Res Net-101, VGG-16, and Res Net-18/50) on both the CIFAR-10/100 and Image Net datasets following the basic setting of (You et al., 2020). |
| Hardware Specification | Yes | All experiments are run in a server with ten NVIDIA 2080 Ti GPUs. measured by training the models on an edge GPU (NVIDIA JETSON TX2) |
| Software Dependencies | No | The paper mentions using an SGD solver with specific parameters (momentum, weight decay) but does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | For the CIFAR-10/100 datasets, the training takes a total of 160 epochs; and the initial learning rate is set to 0.1 and is divided by 10 at the 80-th and 120-th epochs, respectively. For the Image Net dataset, the training takes a total of 90 epochs while the learning rate drops at the 30-th and 60-th epochs, respectively. In all the experiments, the batch size is set to 256, and an SGD solver is adopted with a momentum of 0.9 and a weight decay of 0.0001, following the setting of (Liu et al., 2019c). Additionally, ρ in Equ. 2 is set to 0.05 for all experiments except for the ablation studies. |