DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle
Authors: Lichuan Xiang, Quan Nguyen-Tri, Lan-Cuong Nguyen, Hoang Pham, Khoat Than, Long Tran-Thanh, Hongkai Wen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that DPa I significantly outperforms current state-of-the-art Pa I methods on various architectures, such as Convolutional Neural Networks and Vision-Transformers. Code is available at https://github.com/Quan Nguyen-Tri/DPa I.git |
| Researcher Affiliation | Collaboration | 1University of Warwick, 2Hanoi University of Science and Technology, 3FPT Software AI Center, 4Collov Labs |
| Pseudocode | Yes | Algorithm 1 Differentiable Pruning at Initialization (DPa I) |
| Open Source Code | Yes | Code is available at https://github.com/Quan Nguyen-Tri/DPa I.git |
| Open Datasets | Yes | Our main experiments are conducted with CIFAR-10, CIFAR-100, and Tiny-Image Net datasets, where: CIFAR-10 is augmented by normalizing per-channel, randomly flipping horizontally. CIFAR-100 is augmented by normalizing per-channel, randomly flipping horizontally. Tiny-Image Net is augmented by normalizing per channel, cropping to 64x64, and randomly flipping horizontally. We also perform experiments on Image Net-1K (Deng et al., 2009) to verify our methods work on large-scale dataset tasks. |
| Dataset Splits | Yes | Our main experiments are conducted with CIFAR-10, CIFAR-100, and Tiny-Image Net datasets, where: CIFAR-10 is augmented by normalizing per-channel, randomly flipping horizontally. CIFAR-100 is augmented by normalizing per-channel, randomly flipping horizontally. Tiny-Image Net is augmented by normalizing per channel, cropping to 64x64, and randomly flipping horizontally. We also perform experiments on Image Net-1K (Deng et al., 2009) to verify our methods work on large-scale dataset tasks. |
| Hardware Specification | Yes | We use Pytorch 1 library and conduct experiments on a single A5000. Each model was trained using three random seeds (0, 1, 2) to ensure robustness, and the model was trained on Nvidia A100. |
| Software Dependencies | No | We use Pytorch 1 library and conduct experiments on a single A5000. |
| Experiment Setup | Yes | For training on final sparse network, the hyperparameters are chosen as follows: Table 6: Summary of the architectures, datasets, and hyperparameters used in experiments. Network Dataset Epochs Batch Optimizer Momentum LR LR Drop, Epoch Weight Decay VGG-19 CIFAR-100 160 128 SGD 0.9 0.1 10x, [60,120] 0.0001 Res Net-20 CIFAR-10 160 128 SGD 0.9 0.1 10x, [60,120] 0.0001 Res Net-18 Tiny-Image Net 100 128 SGD 0.9 0.01 10x, [30,60,80] 0.0001 |