Document Neural Autoregressive Distribution Estimation

Authors: Stanislas Lauly, Yin Zheng, Alexandre Allauzen, Hugo Larochelle

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To compare the topic models, two quantitative measures are used. The first one evaluates the generative ability of different models, by computing the perplexity on held-out texts. The second one compares the quality of document representations for information retrieval. Two different datasets are used for the experiments in this Section, a small one and a relatively big one, 20 Newsgroups and RCV1-V2 (Reuters Corpus Volume I) respectively.
Researcher Affiliation Collaboration Stanislas Lauly EMAIL D epartement d informatique Universit e de Sherbrooke Sherbrooke, Qu ebec, Canada Yin Zheng EMAIL Tencent AI Lab Shenzhen, Guangdong, China Alexandre Allauzen EMAIL LIMSI-CNRS Universit e Paris Sud Orsay, France Hugo Larochelle EMAIL D epartement d informatique Universit e de Sherbrooke Sherbrooke, Qu ebec, Canada
Pseudocode Yes Algorithm 1 Computing of the cost for training and the hidden layer for representing an entire document (Doc NADE model) Input: Multinomial observation v a c NLL 0 For i = 1 to D Do hi g(a) p(vi) 1 For m = 1 to |π(vi)| Do p(π(vi)m = 1|v<i) sigm bl(vi)m + Vl(vi)m,: hi(v<i) p(vi) p(vi) (p(π(vi)m = 1|v<i)π(vi)m + (1 p(π(vi)m = 1|v<i)1 π(vi)m)) End for NLL NLL log p(vi) a a + W:,vi End for hfinal g(a) Return NLL, hfinal
Open Source Code No The paper does not contain an explicit statement regarding the public availability of source code nor does it provide a link to a code repository.
Open Datasets Yes Two different datasets are used for the experiments in this Section, a small one and a relatively big one, 20 Newsgroups and RCV1-V2 (Reuters Corpus Volume I) respectively. The 20 Newsgroups corpus has 18,786 documents (postings) partitioned into 20 different classes (newsgroups). RCV1-V2 is a much bigger dataset composed of 804,414 documents (newswire stories) manually categorized into 103 classes (topics).
Dataset Splits Yes The setup consists of respectively 11,284 and 402,207 training examples for 20 Newsgroups and RCV1-V2. We randomly extract 1,000 and 10,000 documents from the training sets of 20 Newsgroups and RCV1-V2, respectively, to build a validation set. The average perplexity per word is used for comparison. This perplexity is estimated using the 50 first documents of a separate test set, as follows:
Hardware Specification No The paper mentions 'efficient implementation on the GPU' but does not specify any particular GPU model or other hardware components used for the experiments.
Software Dependencies No The paper mentions using the 'Adam optimizer (Kingma and Ba, 2014)' but does not specify any other software dependencies with version numbers.
Experiment Setup Yes For these experiments, all the hidden representations of all the models are composed of 50 dimensional features. Early stopping on the validation set is used for avoiding overfitting and for model selection. Our best Deep Doc NADE models were trained with the Adam optimizer (Kingma and Ba, 2014) and with tanh activation function. The hyper-parameters of Adam were selected on the validation set.