Evolutionary Variational Optimization of Generative Models

Authors: Jakob Drefs, Enrico Guiraud, Jörg Lücke

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To show general applicability, we apply the approach to three generative models (we use Noisy-OR Bayes Nets, Binary Sparse Coding, and Spike-and-Slab Sparse Coding). To demonstrate effectiveness and efficiency of the novel variational approach, we use the standard competitive benchmarks of image denoising and inpainting. The benchmarks allow quantitative comparisons to a wide range of methods including probabilistic approaches, deep deterministic and generative networks, and non-local image processing methods.
Researcher Affiliation Academia Jakob Drefs EMAIL Machine Learning Lab University of Oldenburg 26129 Oldenburg, Germany; Enrico Guiraud EMAIL CERN 1211 Geneva, Switzerland, and Machine Learning Lab University of Oldenburg 26129 Oldenburg, Germany; Jörg Lücke EMAIL Machine Learning Lab University of Oldenburg 26129 Oldenburg, Germany
Pseudocode Yes Algorithm 1: Evolutionary Variational Optimization (EVO). define selection, crossover and mutation operators; set hyperparameters S, Ng, etc.; initialize model parameters Θ; for each n: populate set K (n) with S distinct latent states (|K (n)| = S); repeat for n = 1, . . . , N do set K (n) 0 = K (n); for g = 1, . . . , Ng do K (n) g = mutation crossover selection K (n) g 1 ; K (n) = K (n) K (n) g ; remove those (|K (n)| S) elements s in K (n) with lowest p( s, y (n) | Θ); use M-steps with (3) to update Θ; until parameters Θ have sufficiently converged;
Open Source Code Yes The source code can be accessed via https://github.com/tvlearn.
Open Datasets Yes We used image patches extracted from a standard image database (van Hateren and van der Schaaf, 1998). ... For our experiments with the Gumbel-Softmax Variational Autoencoder (GSVAE), we leveraged the source code provided by the original publication (Jang, 2016). For GSVAElin, we chosen a categorical distribution with just two categories (0 or 1) which matches the Bernoulli prior of BSC. ... For CIFAR-10, the patch size was D = 32x32x3, and we used H = 1,024. ... For our measurements, we used a publicly available data set4. (Footnote 4: We downloaded the clean and the corrupted image from https://lear.inrialpes.fr/people/mairal/resources/KSVD_package.tar.gz and the text mask from https://www.visinf.tu-darmstadt.de/media/visinf/software/foe_demo-1_0.zip)
Dataset Splits No The paper describes generating synthetic datasets for verification and using standard benchmark images for denoising and inpainting in a zero-shot learning context (where the model trains on the corrupted image itself). For the CIFAR-10 dataset, it refers to 'test data set' but does not specify the splitting methodology or proportions used by the authors themselves for their experiments.
Hardware Specification Yes Small scale experiments such as the bars tests described in Section 3.2 were performed on standalone workstations with four-core CPUs; runtimes were in the range of minutes. For most of the large-scale experiments described in Sections 3.3 and 4, we used the HPC cluster CARL of the University of Oldenburg to execute our algorithms. The cluster is equipped with several hundred compute nodes with in total several thousands of CPU cores (Intel Xeon E5-2650 v4 12C, E5-2667 v4 8C and E7-8891 v4 10c). ... For the experiments with GSVAE, the code provided by the original publication (Jang, 2016) can execute optimization on GPU cores as is customary for deep models. When executing, for example, the CIFAR-10 experiment (Figure 11) on a single NVIDIA Tesla V100 16GB GPU, we observed runtimes of GSVAElin on the order of a few seconds per iteration (on average approximately 10s); for the CIFAR-10 experiment, we observed GSVAElin to converge within approximately 150 iterations. In comparison, to train EBSC on CIFAR-10, we executed our implementation in parallel on 768 CPU cores (Intel Xeon Platinum 9242) and performed 750 iterations of the algorithm.
Software Dependencies No We implemented EBSC and ES3C in Python using MPI-based parallelization for execution on multiple processors. ... For the experiments with GSVAE, the code provided by the original publication (Jang, 2016) can execute optimization on GPU cores as is customary for deep models. No specific version numbers for Python or any libraries are mentioned.
Experiment Setup Yes For NOR, priors πh were initialized to an arbitrary sparse value (typically 1/H) while the initial weights Winit were sampled from the standard uniform distribution. The K(n) sets were initialized by sampling bits from a Bernoulli distribution that encouraged sparsity. In our experiments, the mean of the distribution was set to 1/H. For BSC, the initial value of the prior was defined πinit = 1/H; the value of (σinit)2 was set to the variance of the data points averaged over the observed dimensions. The columns of the Winit matrix were initialized with the mean of the data points to which some Gaussian noise with a standard deviation of 0.25 σinit was added. The initial values of the K(n) sets were drawn from a Bernoulli distribution with p(sh = 1) = 1/H. To initialize the SSSC model, we uniformly randomly drew πinit and µinit from the interval [0.1, 0.5] and [1, 5], respectively (compare Sheikh et al., 2014) and set Ψinit to the unit matrix. For SSSC, we proceeded as we had done for the BSC model to initialize the dictionary W, the variance parameter σ2 and the sets K(n). ... Table 4 lists the hyperparameters employed in the numerical experiments on verification (Section 3.2), scalability (Section 3.3), denoising (Section 4.2) and inpainting (Section 4.3). EVO hyperparameters (S, Np, Nm, Ng) were chosen s.t. they provided a reasonable trade-off between the accuracy of the approximate inference scheme and the runtime of the algorithm.