Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Authors: Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance. We experiment with the standard image classification benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classification dataset (He et al., 2016). We describe an image segmentation experiment whose neural network model is composed from an inception-like network branch and a densenet network branch. |
| Researcher Affiliation | Collaboration | Jianbo Ye College of Information Sciences and Technology The Pennsylvania State University EMAIL Xin Lu, Zhe Lin Adobe Research EMAIL James Z. Wang College of Information Sciences and Technology The Pennsylvania State University EMAIL |
| Pseudocode | Yes | 4.2 THE ALGORITHM We describe our algorithm below. The following method applies to both training from scratch or re-training from a pre-trained model. Given a training loss l, a convolutional neural net N, and hyper-parameters ρ, α, µ0, our method proceeds as follows: 1. Computation of sparse penalty for each layer. ... 2. γ-W rescaling trick. ... 3. End-to-End training with ISTA on γ. ... 4. Post-process to remove constant channels. ... 5. γ-W rescaling trick. ... 6. Fine-tune e N using regular stochastic gradient learning. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We experiment with the standard image classification benchmark CIFAR-10 with two different network architectures: Conv Net and Res Net-20 (He et al., 2016). We experiment our approach with the pre-trained Res Net-101 on ILSVRC2012 image classification dataset (He et al., 2016). This model was originally trained on multiple datasets. COCO-person (Lin et al., 2014) |
| Dataset Splits | No | The paper mentions 'test accuracy' and 'test datasets' but does not explicitly describe the training, validation, and test splits (e.g., percentages or counts) for the datasets used. |
| Hardware Specification | No | The paper mentions 'across 4 GPUs' in relation to batch size, but does not specify the model or type of GPUs used for the experiments. |
| Software Dependencies | No | The paper mentions 'pre-trained Tensor Flow Res Net-101 model' but does not specify the version of TensorFlow or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use a fixed learning rate µt = 0.01, scaling parameter α = 1.0, and set batch size to 125. We use a warm-up ρ = 0.001 for 30k steps and then train with ρ = 0.005. We set the scaling parameter α = 0.01, the initial learning rate µt = 0.001, the sparsity penalty ρ = 0.1, and the batch size = 128 (across 4 GPUs). The learning rate is decayed every four epochs with rate 0.86. We set α = 0.01, ρ = 0.5, µt = 2 10−5, and batch size = 24. |