reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Augmentable Gamma Belief Networks

Authors: Mingyuan Zhou, Yulai Cong, Bo Chen

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With extensive experiments in text and image analysis, we demonstrate that the deep GBN with two or more hidden layers clearly outperforms the shallow GBN with a single hidden layer in both unsupervisedly extracting latent features for classiﬁcation and predicting heldout data.
Researcher Affiliation	Academia	Mingyuan Zhou EMAIL Department of Information, Risk, and Operations Management Mc Combs School of Business The University of Texas at Austin Austin, TX 78712, USA Yulai Cong yulai EMAIL Bo Chen EMAIL National Laboratory of Radar Signal Processing Collaborative Innovation Center of Information Sensing and Understanding Xidian University Xi an, Shaanxi 710071, China
Pseudocode	Yes	Algorithm 1 The PGBN upward-downward Gibbs sampler that uses a layer-wise training strategy to train a set of networks, each of which adds an additional hidden layer on top of the previously inferred network, retrains all its layers jointly, and prunes inactive factors from the last layer. Algorithm 2 The upward-downward Gibbs samplers for the Ber-GBN and PRG-GBN are constructed by using Lines 1-8 shown below to substitute Lines 4-11 of the PGBN Gibbs sampler shown in Algorithm 1.
Open Source Code	Yes	Matlab code will be available in http://mingyuanzhou.github.io/.
Open Datasets	Yes	We consider the 20newsgroups data set that consists of 18,774 documents from 20 different news groups, with a vocabulary of size K0 = 61,188. It is partitioned into a training set of 11,269 documents and a testing set of 7,505 ones. (http://qwone.com/ jason/20Newsgroups/) We consider both all the 18,774 documents of the 20newsgroups corpus, limiting the vocabulary to the 2000 most frequent terms after removing a standard list of stopwords, and the NIPS12 (http://www.cs.nyu.edu/ roweis/data.html) corpus whose stopwords have already been removed, limiting the vocabulary to the 2000 most frequent terms. We consider the MNIST data set (http://yann.lecun.com/exdb/mnist/), which consists of 60000 training handwritten digits and 10000 testing ones.
Dataset Splits	Yes	It is partitioned into a training set of 11,269 documents and a testing set of 7,505 ones. We randomly choose 30% of the word tokens in each document as training, and use the remaining ones to calculate per-heldout-word perplexity.
Hardware Specification	Yes	Each iteration of jointly training multiple layers usually only costs moderately more than that of training a single layer, e.g., with K1 max = 400, a training iteration on a single core of an Intel Xeon 2.7 GHz CPU takes about 5.6, 6.7, 7.1 seconds for the PGBN with 1, 3, and 5 layers, respectively.
Software Dependencies	No	We use the L2 regularized logistic regression provided by the LIBLINEAR package (Fan et al., 2008) to train a linear classiﬁer on θj in the training set and use it to classify θj in the test set, where the regularization parameter is ﬁve-folder cross-validated on the training set from (2^-10, 2^-9, . . . , 2^15).
Experiment Setup	Yes	We set the hyper-parameters as a0 = b0 = 0.01 and e0 = f0 = 1. Given the trained network, we apply the upward-downward Gibbs sampler to collect 500 MCMC samples after 500 burnins to estimate the posterior mean of the feature usage proportion vector θ(1) j /θ(1) j at the ﬁrst hidden layer, for every document in both the training and testing sets. With the upper bound of the ﬁrst layer s width set as K1 max {25, 50, 100, 200, 400, 600, 800}, and Bt = Ct = 1000 and η(t) = 0.01 for all t, we use Algorithm 1 to train a network with T {1, 2, . . . , 8} layers. We set Ct = 500 and η(t) = 0.05 for all t; we set Bt = 1000 for all t if K1 max ≤ 400, and set B1 = 1000 and Bt = 500 for t ≥ 2 if K1 max > 400.