reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Evolutionary Variational Optimization of Generative Models

Authors: Jakob Drefs, Enrico Guiraud, Jörg Lücke

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To show general applicability, we apply the approach to three generative models (we use Noisy-OR Bayes Nets, Binary Sparse Coding, and Spike-and-Slab Sparse Coding). To demonstrate eﬀectiveness and eﬃciency of the novel variational approach, we use the standard competitive benchmarks of image denoising and inpainting. The benchmarks allow quantitative comparisons to a wide range of methods including probabilistic approaches, deep deterministic and generative networks, and non-local image processing methods.
Researcher Affiliation	Academia	Jakob Drefs EMAIL Machine Learning Lab University of Oldenburg 26129 Oldenburg, Germany; Enrico Guiraud EMAIL CERN 1211 Geneva, Switzerland, and Machine Learning Lab University of Oldenburg 26129 Oldenburg, Germany; Jörg Lücke EMAIL Machine Learning Lab University of Oldenburg 26129 Oldenburg, Germany
Pseudocode	Yes	Algorithm 1: Evolutionary Variational Optimization (EVO). deﬁne selection, crossover and mutation operators; set hyperparameters S, Ng, etc.; initialize model parameters Θ; for each n: populate set K (n) with S distinct latent states (\|K (n)\| = S); repeat for n = 1, . . . , N do set K (n) 0 = K (n); for g = 1, . . . , Ng do K (n) g = mutation crossover selection K (n) g 1 ; K (n) = K (n) K (n) g ; remove those (\|K (n)\| S) elements s in K (n) with lowest p( s, y (n) \| Θ); use M-steps with (3) to update Θ; until parameters Θ have suﬃciently converged;
Open Source Code	Yes	The source code can be accessed via https://github.com/tvlearn.
Open Datasets	Yes	We used image patches extracted from a standard image database (van Hateren and van der Schaaf, 1998). ... For our experiments with the Gumbel-Softmax Variational Autoencoder (GSVAE), we leveraged the source code provided by the original publication (Jang, 2016). For GSVAElin, we chosen a categorical distribution with just two categories (0 or 1) which matches the Bernoulli prior of BSC. ... For CIFAR-10, the patch size was D = 32x32x3, and we used H = 1,024. ... For our measurements, we used a publicly available data set4. (Footnote 4: We downloaded the clean and the corrupted image from https://lear.inrialpes.fr/people/mairal/resources/KSVD_package.tar.gz and the text mask from https://www.visinf.tu-darmstadt.de/media/visinf/software/foe_demo-1_0.zip)
Dataset Splits	No	The paper describes generating synthetic datasets for verification and using standard benchmark images for denoising and inpainting in a zero-shot learning context (where the model trains on the corrupted image itself). For the CIFAR-10 dataset, it refers to 'test data set' but does not specify the splitting methodology or proportions used by the authors themselves for their experiments.
Hardware Specification	Yes	Small scale experiments such as the bars tests described in Section 3.2 were performed on standalone workstations with four-core CPUs; runtimes were in the range of minutes. For most of the large-scale experiments described in Sections 3.3 and 4, we used the HPC cluster CARL of the University of Oldenburg to execute our algorithms. The cluster is equipped with several hundred compute nodes with in total several thousands of CPU cores (Intel Xeon E5-2650 v4 12C, E5-2667 v4 8C and E7-8891 v4 10c). ... For the experiments with GSVAE, the code provided by the original publication (Jang, 2016) can execute optimization on GPU cores as is customary for deep models. When executing, for example, the CIFAR-10 experiment (Figure 11) on a single NVIDIA Tesla V100 16GB GPU, we observed runtimes of GSVAElin on the order of a few seconds per iteration (on average approximately 10s); for the CIFAR-10 experiment, we observed GSVAElin to converge within approximately 150 iterations. In comparison, to train EBSC on CIFAR-10, we executed our implementation in parallel on 768 CPU cores (Intel Xeon Platinum 9242) and performed 750 iterations of the algorithm.
Software Dependencies	No	We implemented EBSC and ES3C in Python using MPI-based parallelization for execution on multiple processors. ... For the experiments with GSVAE, the code provided by the original publication (Jang, 2016) can execute optimization on GPU cores as is customary for deep models. No specific version numbers for Python or any libraries are mentioned.
Experiment Setup	Yes	For NOR, priors πh were initialized to an arbitrary sparse value (typically 1/H) while the initial weights Winit were sampled from the standard uniform distribution. The K(n) sets were initialized by sampling bits from a Bernoulli distribution that encouraged sparsity. In our experiments, the mean of the distribution was set to 1/H. For BSC, the initial value of the prior was deﬁned πinit = 1/H; the value of (σinit)2 was set to the variance of the data points averaged over the observed dimensions. The columns of the Winit matrix were initialized with the mean of the data points to which some Gaussian noise with a standard deviation of 0.25 σinit was added. The initial values of the K(n) sets were drawn from a Bernoulli distribution with p(sh = 1) = 1/H. To initialize the SSSC model, we uniformly randomly drew πinit and µinit from the interval [0.1, 0.5] and [1, 5], respectively (compare Sheikh et al., 2014) and set Ψinit to the unit matrix. For SSSC, we proceeded as we had done for the BSC model to initialize the dictionary W, the variance parameter σ2 and the sets K(n). ... Table 4 lists the hyperparameters employed in the numerical experiments on veriﬁcation (Section 3.2), scalability (Section 3.3), denoising (Section 4.2) and inpainting (Section 4.3). EVO hyperparameters (S, Np, Nm, Ng) were chosen s.t. they provided a reasonable trade-oﬀ between the accuracy of the approximate inference scheme and the runtime of the algorithm.