reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable inference of topic evolution via models for latent geometric structures

Authors: Mikhail Yurochkin, Zhiwei Fan, Aritra Guha, Paraschos Koutris, XuanLong Nguyen

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study ability of our models to learn the latent temporal dynamics and discover new topics that change over time. Next we show that our models scale well by utilizing temporal and group inherent data structures. We also study hyperparameters choices. We analyze two datasets: the Early Journal Content (http://www.jstor.org/dfr/about/sample-datasets), and a collection of Wikipedia articles partitioned by categories and in time according to their popularity.
Researcher Affiliation	Collaboration	Mikhail Yurochkin IBM Research EMAIL Zhiwei Fan University of Wisconsin-Madison EMAIL Aritra Guha University of Michigan EMAIL Paraschos Koutris University of Wisconsin-Madison EMAIL Xuan Long Nguyen University of Michigan EMAIL
Pseudocode	Yes	Algorithm 1 Streaming Dynamic Matching (SDM)
Open Source Code	Yes	Code: https://github.com/moonfolk/SDDM
Open Datasets	Yes	We analyze two datasets: the Early Journal Content (http://www.jstor.org/dfr/about/sample-datasets), and a collection of Wikipedia articles partitioned by categories and in time according to their popularity. Dataset construction details are given in the Supplement.
Dataset Splits	No	The paper mentions setting aside data for 'testing purposes' and evaluating 'perplexity scores on the held out data', implying a training set. However, it does not provide explicit details about training/validation/test splits, such as specific percentages, sample counts, or a clear three-way partitioning methodology for reproduction.
Hardware Specification	No	The paper mentions '20 cores' in Table 1 for DM and SDDM models but does not provide specific hardware details such as CPU/GPU models, memory, or other detailed system specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation or experimentation.
Experiment Setup	Yes	In the preceding experiments we set τ0 = 2, τ1 = 1, γ0 = 1 for SDM; τ1 = 2, γ0 = 1 for DM; τ0 = 4, τ1 = 2, γ0 = 2 for SDDM.