DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle

Authors: Lichuan Xiang, Quan Nguyen-Tri, Lan-Cuong Nguyen, Hoang Pham, Khoat Than, Long Tran-Thanh, Hongkai Wen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results demonstrate that DPa I significantly outperforms current state-of-the-art Pa I methods on various architectures, such as Convolutional Neural Networks and Vision-Transformers. Code is available at https://github.com/Quan Nguyen-Tri/DPa I.git
Researcher Affiliation Collaboration 1University of Warwick, 2Hanoi University of Science and Technology, 3FPT Software AI Center, 4Collov Labs
Pseudocode Yes Algorithm 1 Differentiable Pruning at Initialization (DPa I)
Open Source Code Yes Code is available at https://github.com/Quan Nguyen-Tri/DPa I.git
Open Datasets Yes Our main experiments are conducted with CIFAR-10, CIFAR-100, and Tiny-Image Net datasets, where: CIFAR-10 is augmented by normalizing per-channel, randomly flipping horizontally. CIFAR-100 is augmented by normalizing per-channel, randomly flipping horizontally. Tiny-Image Net is augmented by normalizing per channel, cropping to 64x64, and randomly flipping horizontally. We also perform experiments on Image Net-1K (Deng et al., 2009) to verify our methods work on large-scale dataset tasks.
Dataset Splits Yes Our main experiments are conducted with CIFAR-10, CIFAR-100, and Tiny-Image Net datasets, where: CIFAR-10 is augmented by normalizing per-channel, randomly flipping horizontally. CIFAR-100 is augmented by normalizing per-channel, randomly flipping horizontally. Tiny-Image Net is augmented by normalizing per channel, cropping to 64x64, and randomly flipping horizontally. We also perform experiments on Image Net-1K (Deng et al., 2009) to verify our methods work on large-scale dataset tasks.
Hardware Specification Yes We use Pytorch 1 library and conduct experiments on a single A5000. Each model was trained using three random seeds (0, 1, 2) to ensure robustness, and the model was trained on Nvidia A100.
Software Dependencies No We use Pytorch 1 library and conduct experiments on a single A5000.
Experiment Setup Yes For training on final sparse network, the hyperparameters are chosen as follows: Table 6: Summary of the architectures, datasets, and hyperparameters used in experiments. Network Dataset Epochs Batch Optimizer Momentum LR LR Drop, Epoch Weight Decay VGG-19 CIFAR-100 160 128 SGD 0.9 0.1 10x, [60,120] 0.0001 Res Net-20 CIFAR-10 160 128 SGD 0.9 0.1 10x, [60,120] 0.0001 Res Net-18 Tiny-Image Net 100 128 SGD 0.9 0.01 10x, [30,60,80] 0.0001