WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling
Authors: Hao Zhang, Bo Chen, Dandan Guo, Mingyuan Zhou
ICLR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora. |
| Researcher Affiliation | Academia | Hao Zhang, Bo Chen & Dandan Guo National Laboraory of Radar Signal Processing, Collaborative Innovation Center of Information Sensing and Understanding, Xidian University, Xi an, China. EMAIL EMAIL EMAIL. Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, Austin, TX 78712, USA. EMAIL |
| Pseudocode | Yes | Algorithm 1 Hybrid stochastic-gradient MCMC and autoencoding variational inference for WHAI |
| Open Source Code | No | The paper states 'Our code is written in Theano (Theano Development Team, 2016).' but does not provide a specific link or explicit statement about releasing the source code for WHAI. |
| Open Datasets | Yes | We compare the performance of different algorithms on 20Newsgroups (20News), Reuters Corpus Volume I (RCV1), and Wikipedia (Wiki)... Wiki, with a vocabulary size of 7,702, consists of 10 million documents randomly downloaded from Wikipedia using the script provided for Hoffman et al. (2010). |
| Dataset Splits | No | for each corpus, we randomly select 70% of the word tokens from each document to form a training matrix T, holding out the remaining 30% to form a testing matrix Y. The paper specifies training and testing splits but does not mention a validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper states 'Our code is written in Theano (Theano Development Team, 2016).', which refers to the framework but does not provide a specific version number for Theano or other software dependencies. |
| Experiment Setup | Yes | For the proposed model, we set the mini-batch size as 200, and use as burn-in 2000 mini-batches for both 20News and RCV1 and 3500 for wiki. We collect 3000 samples after burn-in to calculate perplexity. The hyperparameters of WHAI are set as: η(l) = 1/Kl, r = 1, and c(l) n = 1. |