Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies

Authors: Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that optimizing discrete structured VAEs using NES is as effective as gradient-based approximations. Lastly, we prove NES converges for non-Lipschitz functions as appear in discrete structured VAEs.
Researcher Affiliation Collaboration Alon Berliner Technion, IIT EMAIL Guy Rotman Technion, IIT EMAIL Yossi Adi Meta AI Research EMAIL Roi Reichart Technion, IIT EMAIL Tamir Hazan Technion, IIT EMAIL
Pseudocode Yes Algorithm 1 Natural Evolution Strategies for discrete VAEs
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets Yes In our experiments, we utilize the dataset developed by Paulus et al. [48]... We consider the Universal Dependencies (UD) dataset [35, 44, 45]... The experiments were conducted on the Fashion MNIST dataset [59] with fixed binarization [51]... Experiments are conducted on the Fashion MNIST [59], KMNIST [7], and Omniglot [29] datasets with fixed binarization [51].
Dataset Splits Yes All reported values are measured on a test set, and the models were selected using early stopping on the validation set.
Hardware Specification Yes All the following experiments were conducted using an internal cluster with 4 Tesla-K80 NVIDIA GPUs.
Software Dependencies No The paper mentions software like
Experiment Setup Yes We run our experiments with the same set of parameters as in Paulus et al. [48], except that during decoding we use teacher-forcing every 3 steps instead of 9 steps. We fix NES parameters to be σ = 0.01 and N = 600... We set the hyper-parameters to those of the original implementation of Kiperwasser & Goldberg [22] and feed the models with the multilingual Fast Text word embeddings [16]. We perform a grid-search for each of the methods separately over learning rates in [5 10 4, 1 10 5] and set the mini-batch size to 128. We fix NES parameters to be σ = 0.1 and N = 400. Adam optimizer [21] is used to optimize all methods... All models were trained using the ADAM optimizer [21] over 300 epochs with a constant learning rate of 10 3 and a batch size of 128.