reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

Authors: Hao Zhang, Bo Chen, Dandan Guo, Mingyuan Zhou

ICLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness and efﬁciency of WHAI are illustrated with experiments on big corpora.
Researcher Affiliation	Academia	Hao Zhang, Bo Chen & Dandan Guo National Laboraory of Radar Signal Processing, Collaborative Innovation Center of Information Sensing and Understanding, Xidian University, Xi an, China. EMAIL EMAIL EMAIL. Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, Austin, TX 78712, USA. EMAIL
Pseudocode	Yes	Algorithm 1 Hybrid stochastic-gradient MCMC and autoencoding variational inference for WHAI
Open Source Code	No	The paper states 'Our code is written in Theano (Theano Development Team, 2016).' but does not provide a specific link or explicit statement about releasing the source code for WHAI.
Open Datasets	Yes	We compare the performance of different algorithms on 20Newsgroups (20News), Reuters Corpus Volume I (RCV1), and Wikipedia (Wiki)... Wiki, with a vocabulary size of 7,702, consists of 10 million documents randomly downloaded from Wikipedia using the script provided for Hoffman et al. (2010).
Dataset Splits	No	for each corpus, we randomly select 70% of the word tokens from each document to form a training matrix T, holding out the remaining 30% to form a testing matrix Y. The paper specifies training and testing splits but does not mention a validation split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper states 'Our code is written in Theano (Theano Development Team, 2016).', which refers to the framework but does not provide a specific version number for Theano or other software dependencies.
Experiment Setup	Yes	For the proposed model, we set the mini-batch size as 200, and use as burn-in 2000 mini-batches for both 20News and RCV1 and 3500 for wiki. We collect 3000 samples after burn-in to calculate perplexity. The hyperparameters of WHAI are set as: η(l) = 1/Kl, r = 1, and c(l) n = 1.