Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions

Authors: Stanislas Ducotterd, Alexis Goujon, Pakshal Bohra, Dimitris Perdios, Sebastian Neumayer, Michael Unser

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments show that our trained networks compare favorably with existing 1-Lipschitz neural architectures. [...] Application: First, we systematically assess the practical expressivity of various 1-Lip architectures based on function fitting, Wasserstein-1 distance estimation and Wasserstein GAN training. Then, as our main application, we perform image reconstruction within the popular Pn P framework.
Researcher Affiliation Academia Stanislas Ducotterd EMAIL Alexis Goujon EMAIL Pakshal Bohra EMAIL Dimitris Perdios EMAIL Sebastian Neumayer EMAIL Michael Unser EMAIL Biomedical Imaging Group, Ecole polytechnique f ed erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. Methods are described using prose and mathematical equations.
Open Source Code Yes Our code is accessible on Github1. 1. https://github.com/Stanislas Ducotterd/Lipschitz DSNN
Open Datasets Yes MNIST: Here, P1 is a uniform distribution over a set of real MNIST2 images and P2 is the generator distribution of a GAN trained to generate MNIST images. 2. http://yann.lecun.com/exdb/mnist/ [...] The training dataset consists of 238400 patches of size (40 40) taken from the BSD500 image dataset (Arbel aez et al., 2011). [...] we use fully sampled knee MR images of size (320 320) from the fast MRI dataset (Knoll et al., 2020) as ground truths. [...] The groundtruth comes from human abdominal CT scans for 10 patients provided by Mayo Clinic for the low-dose CT Grand Challenge (Mc Collough, 2016).
Dataset Splits Yes The mean squared error (MSE) loss is computed over 1000 uniformly sampled points from [ 1, 1] for training, and a uniform partition of [ 1, 1] with 10000 points for testing. [...] We optimize the neural representation on 54000 images from the MNIST training set and use the 6000 remaining ones as validation set. The test set contains 10000 MNIST images. [...] we create validation and test sets consisting of 100 and 99 images, respectively [...] The validation set consists of 6 images taken uniformly from the first patient of the training set from Mukherjee et al. (2021). We use the same test set as Mukherjee et al. (2021), more precisely, 128 slices with size (512 512) that correspond to one patient.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. It mentions software like 'Py Torch implementation' and 'Adam optimizer' but no hardware.
Software Dependencies No All 1-Lip networks are learned with the Adam optimizer (Kingma and Ba, 2015) and the default hyperparameters of its Py Torch implementation. [...] we report the inception score on the MNIST test set using the implementation from Li et al. (2017) [...] which is part of the BART toolbox (Uecker et al., 2013). The paper mentions software tools and frameworks but does not specify their version numbers.
Experiment Setup Yes All 1-Lip networks are learned with the Adam optimizer (Kingma and Ba, 2015) and the default hyperparameters of its Py Torch implementation. For the parameters of the PRe LU and HH activation functions, the learning rate is the same as for the weights of the network. The LLS networks use three different learning rates: η for the weights, η/4 for the scaling parameters α, and η/40 for the remaining parameters of the LLS. [...] Re LU networks have 10 layers and a width of 50; AV, PRe LU, and HH networks have 8 layers and a width of 20; GS networks have 7 layers and a width of 20; LSS networks have 4 layers and a width of 10. We initialized the PRe LU as the absolute value, we used GS with a group size of 5, and the LLS was initialized as Re LU and had a range of [ 0.5, 0.5], 100 linear regions, and λ = 10 7 for the TV(2) regularization. Every network relied on Kaiming initialization (He et al., 2015) and was trained 25 times with a batch size of 10 for 1000 epochs. The LLS networks always used η = 2 10 3, while the other ones used η = 4 10 3 for f1, f2 and η = 10 3 for f3.