Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Explicit Group Sparse Projection with Applications to Deep Learning and NMF
Authors: Riyasat Ohib, Nicolas Gillis, Niccolo Dalmasso, Sameena Shah, Vamsi K. Potluru, Sergey Plis
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the efficacy of our approach in both supervised and unsupervised learning tasks on image datasets including CIFAR10 and Image Net. In deep neural network pruning, the sparse models produced by our method on Res Net50 have significantly higher accuracies at corresponding sparsity values compared to existing competitors. In nonnegative matrix factorization, our approach yields competitive reconstruction errors against state-of-the-art algorithms. |
| Researcher Affiliation | Collaboration | Riyasat Ohib EMAIL TRe NDS Center, Georgia Institute of Technology Nicolas Gillis EMAIL University of Mons Niccolรฒ Dalmasso EMAIL J.P. Morgan AI Research Sameena Shah EMAIL J.P. Morgan AI Research Vamsi K. Potluru EMAIL J.P. Morgan AI Research Sergey Plis EMAIL TRe NDS Center |
| Pseudocode | Yes | Algorithm 1 summarizes the GSP algorithm. |
| Open Source Code | Yes | The code is available at https://github.com/riohib/gsp-for-deeplearning |
| Open Datasets | Yes | We showcase the efficacy of our approach in both supervised and unsupervised learning tasks on image datasets including CIFAR10 and Image Net. VGG16 on CIFAR-10 (Krizhevsky et al., 2009) dataset... ILSVRC2012 Imagenet dataset (Russakovsky et al., 2015)... CBCL facial image data ... used in the seminal work by Lee & Seung (1999) |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet datasets for training and testing, and describes preprocessing steps like horizontal flip, random crop, normalization, and augmentation. However, it does not explicitly state the dataset splits (e.g., percentages or exact counts for training, validation, and testing sets) in the main text or appendices. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'GPU-compatible implementation' in the context of code. |
| Software Dependencies | No | The paper mentions 'Py Torch' as a framework for the GPU-compatible implementation, but does not specify any version numbers for PyTorch or other software libraries or dependencies. While a GitHub link is provided for the code, explicit versioning of software dependencies is not detailed in the paper's text. |
| Experiment Setup | Yes | For the experiments with intermittent projections during the training phase, we project the weights of the VGG16 model using GSP with sparsity level s, perform a forward pass on the projected weights and finally update the model parameters using backpropagation every 150 iterations for 200 epochs, starting from epoch 40. We reduce the learning rate by a factor of 0.1 at the milestone epochs of 80, 120 and 160. Next, we set the s fraction of the lowest parameters of the model to zero. At this point the model is sparse with a layerwise hoyer sparsity of s. However, since we project intermittently and with hoyer sparsity being a differentiable approximation to the โ0 norm, we then prune the surviving weights that are close to zero or zero, keeping the largest 1 s fraction of the parameters, where s is the final sparsity of the model. Finally, we finetune the surviving parameters for 200 epochs with a learning rate of 0.01 and dropping the rate by 0.1 in the same milestones as the sparsity inducing run. For the experiments with induced-GSP, we project the weights of the Res Net50 model in a similar technique to the experiments performed with the CIFAR-10 dataset. We first project the layers of the model using GSP with s = 0.80, perform a forward pass on the projected weights and finally update the model parameters using backpropagation every 500 iterations for 120 epochs. We start the projection of the model from epoch 40 and keep projecting every 500 iterations till the final epoch. In both the cases of CIFAR-10 and Image Net we choose the iteration interval of projection in such a way so that there are 3 projections per epoch. We reduce the learning rate by a factor of 0.1 at the milestone epochs of 70 and 100. Next, we set the s fraction of the lowest parameters of the model to zero. We next prune the surviving weights that are close to zero or zero, keeping the largest 1 s fraction of the parameters, where s is the final sparsity of the model. Finally, we finetune the surviving parameters for 140 epochs with a learning rate of 0.001 and dropping the rate by 0.1 in the same milestones as the inducing-GSP run. |