reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Topic-Guided Language Models

Authors: Carolina Zheng, Keyon Vafa, David Blei

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we detail the reproducibility study and results. We also investigate the quality of learned topics and probe the LSTM-LM s hidden representations to find the amount of retained topic information.
Researcher Affiliation	Academia	Carolina Zheng EMAIL Department of Computer Science Columbia University Keyon Vafa EMAIL Department of Computer Science Columbia University David M. Blei EMAIL Department of Statistics Department of Computer Science Columbia University
Pseudocode	No	The paper describes models and their components using mathematical equations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We make public all code used for this study.1 1https://github.com/carolinazheng/revisiting-tglms
Open Datasets	Yes	We use four publicly available natural language datasets: APNEWS,2 IMDB (Maas et al., 2011), BNC (Consortium, 2007), and Wiki Text-2 (Merity et al., 2017). We follow the training, validation, and test splits from Lau et al. (2017) and Merity et al. (2017).
Dataset Splits	Yes	We follow the training, validation, and test splits from Lau et al. (2017) and Merity et al. (2017). Table 4 shows the dataset statistics. The data is preprocessed as follows. For Wiki Text-2, we use the standard vocabulary, tokenization, and splits from Merity et al. (2017).
Hardware Specification	Yes	The models in our codebase train to convergence within three days on a single Tesla V100 GPU. r GBN-RNN, trained using its public codebase, trains to convergence within one week on the same GPU. The experiments can be replicated on an AWS Tesla V100 GPU with 16GB GPU memory.
Software Dependencies	Yes	LSTM-LM, Topic RNN, VRTM, and TDLM are implemented in our codebase in Pytorch 1.12. We use the original implementation of r GBN-RNN, which uses Tensorflow 1.9.
Experiment Setup	Yes	For all LSTM-LM baselines, we use a hidden size of 600, word embeddings of size 300 initialized with Google News word2vec embeddings (Mikolov et al., 2013), and dropout of 0.4 between the LSTM input and output layers (and between the hidden layers for the 3-layer models). We train the RNN components using truncated backpropagation through time with a sequence length of 30. Following Lau et al. (2017), Rezaee & Ferraro (2020), and Guo et al. (2020), we use the Adam optimizer with a learning rate of 0.001 on APNEWS, IMDB, and BNC. For Wiki Text-2, we follow Merity et al. (2017) and use stochastic gradient descent; the initial learning rate is 20 and is divided by 4 when validation perplexity is worse than the previous iteration. The models are trained until validation perplexity does not improve for 5 epochs and we use the best validation checkpoint. We train all models on single GPUs with a language model batch size of 64. We train LDA via Gibbs sampling using Mallet (Mc Callum, 2002). The hyperparameters are: ̑̑ (topic density) = 50, Β (word density) = 0.01, number of iterations = 1000.