A Bregman Learning Framework for Sparse Neural Networks

Authors: Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4 we first discuss our statistical sparse initialization strategy and then evaluate our algorithms on benchmark data sets (MNIST, Fashion-MNIST, CIFAR-10) using feedforward, convolutional, and residual neural networks. Table 1 shows that all algorithms manage to compute very sparse networks with ca. 2% drop in test accuracy on Fasion-MNIST, compared to vanilla dense training with Adam. Table 3 shows the resulting sparsity levels of the total number of parameters and the percentage of non-zero convolutional kernels as well as the train and test accuracies.
Researcher Affiliation Academia Leon Bungert EMAIL HausdorffCenter for Mathematics University of Bonn Endenicher Allee 62, Villa Maria, 53115 Bonn, Germany. Tim Roith EMAIL Department of Mathematics Friedrich-Alexander-Universit at Erlangen-N urnberg Cauerstraße 11, 91058 Erlangen, Germany.
Pseudocode Yes Algorithm 1: Lin Breg, an inverse scale space algorithm for training sparse neural networks by successively adding weights whilst minimizing the loss. Algorithm 2: Lin Breg with Momentum, an acceleration of Lin Breg using momentum-based gradient memory. Algorithm 3: Ada Breg, a Bregman version of the Adam algorithm which uses moment-based bias correction.
Open Source Code Yes Code is available at https://github.com/Tim Roith/Bregman Learning.
Open Datasets Yes We consider the classification task on the MNIST dataset (Le Cun and Cortes, 2010) for studying the impact of the hyperparameters of these methods. The set consists of 60, 000 images of handwritten digits which we split into 55, 000 images used for the training and 5, 000 images used for a validation process during training. We train a fully connected net with Re LU activations and two hidden layers (200 and 80 neurons), and use the ℓ1-regularization from (1.13), on the MNIST dataset... In this example we apply our algorithms to a convolutional neural network... to solve the classification task on Fashion-MNIST. In this experiment we trained a Res Net-18 architecture for classification on CIFAR10
Dataset Splits Yes The set consists of 60, 000 images of handwritten digits which we split into 55, 000 images used for the training and 5, 000 images used for a validation process during training.
Hardware Specification No No specific hardware details (like GPU/CPU models or types) are provided in the paper.
Software Dependencies No Our code is available on Git Hub at https://github.com/Tim Roith/Bregman Learning and relies on Py Torch (Paszke et al., 2019). (No specific version number for PyTorch is given).
Experiment Setup Yes The learning rate is chosen as τ = 0.1 and is multiplied by a factor of 0.5 whenever the validation accuracy stagnates. We initialize the weights with 1% non-zero entries, i.e., r = 0.01.