reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decoupling Sparsity and Smoothness in the Dirichlet Variational Autoencoder Topic Model

Authors: Sophie Burkhardt, Stefan Kramer

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our method is competitive with other recent VAE topic models. Keywords: variational autoencoders, topic models, Dirichlet distribution, reparameterization, generative models. ... We perform an extensive experimental comparison with state-of-the-art VAE topic models to show that our model achieves the highest topic coherence. ... All models are evaluated on three measures, perplexity, topic redundancy and topic coherence.
Researcher Affiliation	Academia	Sophie Burkhardt EMAIL Stefan Kramer EMAIL University of Mainz Department of Computer Science Staudingerweg 9 55128 Mainz, Germany
Pseudocode	No	The paper includes illustrations of neural network architectures (Figure 1, Figure 2) and mathematical equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	No	The paper mentions implementations provided by other authors for baseline models (Prod LDA, NVDM), for example: "We use the implementation provided by the authors.3" for Prod LDA and "We used the code provided by the authors.4" for NVDM. However, there is no explicit statement or link indicating that the authors provide source code for their own proposed DVAE or DVAE Sparse methodology.
Open Datasets	Yes	20news: We used the same version of this data set as Srivastava and Sutton (2017). NIPS: The NIPS data set was retrieved in a preprocessed format from the UCI Machine Learning Repository (Perrone et al., 2017). KOS: The 3,430 blog entries of this data set were originally extracted from http://www.dailykos.com/, the data set is available in the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Bag+of+Words. Rcv1:2 This corpus contains 810,000 documents of Reuters news stories. We separated 10,000 documents as a testset and pruned the vocabulary to 10,000 words after stopword removal following Miao et al. (2016).
Dataset Splits	Yes	20news: ...11,000 training instances and a 2,000 word vocabulary. NIPS: ...1,000 documents were separated as a test set. Rcv1: ...We separated 10,000 documents as a testset and pruned the vocabulary to 10,000 words after stopword removal following Miao et al. (2016).
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It only refers to general experimental settings without hardware specifications.
Software Dependencies	No	The paper mentions software components like "Adam optimizer" and "tensorﬂow library" but does not provide specific version numbers for these or any other key software dependencies.
Experiment Setup	Yes	The number of neurons was set to 100 for all hidden layers in the neural network models. We used a single sample for all methods. A batch size of 200 was used for the training of all models. Training was monitored on a validation set with early stopping and a look-ahead of 30 iterations. We used the Adam optimizer for training. ...The learning rate is optimized for all neural network methods using Bayesian optimization.