reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies

Authors: Alon Berliner, Guy Rotman, Yossi Adi, Roi Reichart, Tamir Hazan

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically that optimizing discrete structured VAEs using NES is as effective as gradient-based approximations. Lastly, we prove NES converges for non-Lipschitz functions as appear in discrete structured VAEs.
Researcher Affiliation	Collaboration	Alon Berliner Technion, IIT EMAIL Guy Rotman Technion, IIT EMAIL Yossi Adi Meta AI Research EMAIL Roi Reichart Technion, IIT EMAIL Tamir Hazan Technion, IIT EMAIL
Pseudocode	Yes	Algorithm 1 Natural Evolution Strategies for discrete VAEs
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets	Yes	In our experiments, we utilize the dataset developed by Paulus et al. [48]... We consider the Universal Dependencies (UD) dataset [35, 44, 45]... The experiments were conducted on the Fashion MNIST dataset [59] with ﬁxed binarization [51]... Experiments are conducted on the Fashion MNIST [59], KMNIST [7], and Omniglot [29] datasets with ﬁxed binarization [51].
Dataset Splits	Yes	All reported values are measured on a test set, and the models were selected using early stopping on the validation set.
Hardware Specification	Yes	All the following experiments were conducted using an internal cluster with 4 Tesla-K80 NVIDIA GPUs.
Software Dependencies	No	The paper mentions software like
Experiment Setup	Yes	We run our experiments with the same set of parameters as in Paulus et al. [48], except that during decoding we use teacher-forcing every 3 steps instead of 9 steps. We ﬁx NES parameters to be σ = 0.01 and N = 600... We set the hyper-parameters to those of the original implementation of Kiperwasser & Goldberg [22] and feed the models with the multilingual Fast Text word embeddings [16]. We perform a grid-search for each of the methods separately over learning rates in [5 10 4, 1 10 5] and set the mini-batch size to 128. We ﬁx NES parameters to be σ = 0.1 and N = 400. Adam optimizer [21] is used to optimize all methods... All models were trained using the ADAM optimizer [21] over 300 epochs with a constant learning rate of 10 3 and a batch size of 128.