Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Scalable inference of topic evolution via models for latent geometric structures
Authors: Mikhail Yurochkin, Zhiwei Fan, Aritra Guha, Paraschos Koutris, XuanLong Nguyen
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study ability of our models to learn the latent temporal dynamics and discover new topics that change over time. Next we show that our models scale well by utilizing temporal and group inherent data structures. We also study hyperparameters choices. We analyze two datasets: the Early Journal Content (http://www.jstor.org/dfr/about/sample-datasets), and a collection of Wikipedia articles partitioned by categories and in time according to their popularity. |
| Researcher Affiliation | Collaboration | Mikhail Yurochkin IBM Research EMAIL Zhiwei Fan University of Wisconsin-Madison EMAIL Aritra Guha University of Michigan EMAIL Paraschos Koutris University of Wisconsin-Madison EMAIL Xuan Long Nguyen University of Michigan EMAIL |
| Pseudocode | Yes | Algorithm 1 Streaming Dynamic Matching (SDM) |
| Open Source Code | Yes | Code: https://github.com/moonfolk/SDDM |
| Open Datasets | Yes | We analyze two datasets: the Early Journal Content (http://www.jstor.org/dfr/about/sample-datasets), and a collection of Wikipedia articles partitioned by categories and in time according to their popularity. Dataset construction details are given in the Supplement. |
| Dataset Splits | No | The paper mentions setting aside data for 'testing purposes' and evaluating 'perplexity scores on the held out data', implying a training set. However, it does not provide explicit details about training/validation/test splits, such as specific percentages, sample counts, or a clear three-way partitioning methodology for reproduction. |
| Hardware Specification | No | The paper mentions '20 cores' in Table 1 for DM and SDDM models but does not provide specific hardware details such as CPU/GPU models, memory, or other detailed system specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation or experimentation. |
| Experiment Setup | Yes | In the preceding experiments we set τ0 = 2, τ1 = 1, γ0 = 1 for SDM; τ1 = 2, γ0 = 1 for DM; τ0 = 4, τ1 = 2, γ0 = 2 for SDDM. |