Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions
Authors: Stanislas Ducotterd, Alexis Goujon, Pakshal Bohra, Dimitris Perdios, Sebastian Neumayer, Michael Unser
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments show that our trained networks compare favorably with existing 1-Lipschitz neural architectures. [...] Application: First, we systematically assess the practical expressivity of various 1-Lip architectures based on function fitting, Wasserstein-1 distance estimation and Wasserstein GAN training. Then, as our main application, we perform image reconstruction within the popular Pn P framework. |
| Researcher Affiliation | Academia | Stanislas Ducotterd EMAIL Alexis Goujon EMAIL Pakshal Bohra EMAIL Dimitris Perdios EMAIL Sebastian Neumayer EMAIL Michael Unser EMAIL Biomedical Imaging Group, Ecole polytechnique f ed erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. Methods are described using prose and mathematical equations. |
| Open Source Code | Yes | Our code is accessible on Github1. 1. https://github.com/Stanislas Ducotterd/Lipschitz DSNN |
| Open Datasets | Yes | MNIST: Here, P1 is a uniform distribution over a set of real MNIST2 images and P2 is the generator distribution of a GAN trained to generate MNIST images. 2. http://yann.lecun.com/exdb/mnist/ [...] The training dataset consists of 238400 patches of size (40 40) taken from the BSD500 image dataset (Arbel aez et al., 2011). [...] we use fully sampled knee MR images of size (320 320) from the fast MRI dataset (Knoll et al., 2020) as ground truths. [...] The groundtruth comes from human abdominal CT scans for 10 patients provided by Mayo Clinic for the low-dose CT Grand Challenge (Mc Collough, 2016). |
| Dataset Splits | Yes | The mean squared error (MSE) loss is computed over 1000 uniformly sampled points from [ 1, 1] for training, and a uniform partition of [ 1, 1] with 10000 points for testing. [...] We optimize the neural representation on 54000 images from the MNIST training set and use the 6000 remaining ones as validation set. The test set contains 10000 MNIST images. [...] we create validation and test sets consisting of 100 and 99 images, respectively [...] The validation set consists of 6 images taken uniformly from the first patient of the training set from Mukherjee et al. (2021). We use the same test set as Mukherjee et al. (2021), more precisely, 128 slices with size (512 512) that correspond to one patient. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. It mentions software like 'Py Torch implementation' and 'Adam optimizer' but no hardware. |
| Software Dependencies | No | All 1-Lip networks are learned with the Adam optimizer (Kingma and Ba, 2015) and the default hyperparameters of its Py Torch implementation. [...] we report the inception score on the MNIST test set using the implementation from Li et al. (2017) [...] which is part of the BART toolbox (Uecker et al., 2013). The paper mentions software tools and frameworks but does not specify their version numbers. |
| Experiment Setup | Yes | All 1-Lip networks are learned with the Adam optimizer (Kingma and Ba, 2015) and the default hyperparameters of its Py Torch implementation. For the parameters of the PRe LU and HH activation functions, the learning rate is the same as for the weights of the network. The LLS networks use three different learning rates: η for the weights, η/4 for the scaling parameters α, and η/40 for the remaining parameters of the LLS. [...] Re LU networks have 10 layers and a width of 50; AV, PRe LU, and HH networks have 8 layers and a width of 20; GS networks have 7 layers and a width of 20; LSS networks have 4 layers and a width of 10. We initialized the PRe LU as the absolute value, we used GS with a group size of 5, and the LLS was initialized as Re LU and had a range of [ 0.5, 0.5], 100 linear regions, and λ = 10 7 for the TV(2) regularization. Every network relied on Kaiming initialization (He et al., 2015) and was trained 25 times with a batch size of 10 for 1000 epochs. The LLS networks always used η = 2 10 3, while the other ones used η = 4 10 3 for f1, f2 and η = 10 3 for f3. |